# Introduction

In this guide, we will walk you through building a powerful semantic search engine using [Couchbase](https://www.couchbase.com) as the backend database and [CrewAI](https://github.com/crewAIInc/crewAI) for agent-based RAG operations. CrewAI allows us to create specialized agents that can work together to handle different aspects of the RAG workflow, from document retrieval to response generation. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch.

How to run this tutorial
----------------------
This tutorial is available as a Jupyter Notebook (.ipynb file) that you can run 
interactively. You can access the original notebook here.

You can either:
- Download the notebook file and run it on [Google Colab](https://colab.research.google.com)
- Run it on your system by setting up the Python environment

Before you start
---------------

1. Create and Deploy Your Free Tier Operational cluster on [Capella](https://cloud.couchbase.com/sign-up)
   - To get started with [Couchbase Capella](https://cloud.couchbase.com), create an account and use it to deploy 
     a forever free tier operational cluster
   - This account provides you with an environment where you can explore and learn 
     about Capella with no time constraint
   - To learn more, please follow the [Getting Started Guide](https://docs.couchbase.com/cloud/get-started/create-account.html)

2. Couchbase Capella Configuration
   When running Couchbase using Capella, the following prerequisites need to be met:
   - Create the database credentials to access the required bucket (Read and Write) used in the application
   - Allow access to the Cluster from the IP on which the application is running by following the [Network Security documentation](https://docs.couchbase.com/cloud/security/security.html#public-access)

# Setting the Stage: Installing Necessary Libraries

We'll install the following key libraries:
- `datasets`: For loading and managing our training data
- `langchain-couchbase`: To integrate Couchbase with LangChain for vector storage and caching
- `langchain-openai`: For accessing OpenAI's embedding and chat models
- `crewai`: To create and orchestrate our AI agents for RAG operations
- `python-dotenv`: For securely managing environment variables and API keys

These libraries provide the foundation for building a semantic search engine with vector embeddings, 
database integration, and agent-based RAG capabilities.

In [1]:
%pip install --quiet datasets langchain-couchbase langchain-openai crewai python-dotenv

Note: you may need to restart the kernel to use updated packages.


# Importing Necessary Libraries
The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading.

In [2]:
import getpass
import json
import logging
import os
import time
from datetime import timedelta

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.diagnostics import PingState, ServiceType
from couchbase.exceptions import (InternalServerFailureException,
                                  QueryIndexAlreadyExistsException,
                                  ServiceUnavailableException)
from couchbase.management.buckets import CreateBucketSettings
from couchbase.management.search import SearchIndex
from couchbase.options import ClusterOptions
from datasets import load_dataset
from dotenv import load_dotenv
from langchain.tools import Tool
from langchain_couchbase.vectorstores import CouchbaseVectorStore
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

from crewai import Agent, Crew, Process, Task

# Setup Logging
Logging is configured to track the progress of the script and capture any errors or warnings.

In [3]:
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s [%(levelname)s] %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S'
)

# Suppress httpx logging
logging.getLogger('httpx').setLevel(logging.CRITICAL)

# Loading Sensitive Informnation
In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security.

The script uses environment variables to store sensitive information, enhancing the overall security and maintainability of your code by avoiding hardcoded values.

In [4]:
# Load environment variables
load_dotenv()

# Configuration
OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or input("Enter your OpenAI API key: ")
if not OPENAI_API_KEY:
    raise ValueError("OPENAI_API_KEY is not set")

CB_HOST = os.getenv('CB_HOST') or input("Enter Couchbase host (default: couchbase://localhost): ") or 'couchbase://localhost'
CB_USERNAME = os.getenv('CB_USERNAME') or input("Enter Couchbase username (default: Administrator): ") or 'Administrator'
CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass("Enter Couchbase password (default: password): ") or 'password'
CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input("Enter bucket name (default: vector-search-testing): ") or 'vector-search-testing'
INDEX_NAME = os.getenv('INDEX_NAME') or input("Enter index name (default: vector_search_crew): ") or 'vector_search_crew'
SCOPE_NAME = os.getenv('SCOPE_NAME') or input("Enter scope name (default: shared): ") or 'shared'
COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input("Enter collection name (default: crew): ") or 'crew'

print("Configuration loaded successfully")

Configuration loaded successfully


# Connecting to the Couchbase Cluster
Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount.

In [5]:
# Connect to Couchbase
try:
    auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
    options = ClusterOptions(auth)
    cluster = Cluster(CB_HOST, options)
    cluster.wait_until_ready(timedelta(seconds=5))
    print("Successfully connected to Couchbase")
except Exception as e:
    print(f"Failed to connect to Couchbase: {str(e)}")
    raise

Successfully connected to Couchbase


# Verifying Search Service Availability
 In this section, we verify that the Couchbase Search (FTS) service is available and responding correctly. This is a crucial check because our vector search functionality depends on it. If any issues are detected with the Search service, the function will raise an exception, allowing us to catch and handle problems early before attempting vector operations.


In [6]:
def check_search_service(cluster):
    """Verify search service availability using ping"""
    try:
        # Get ping result
        ping_result = cluster.ping()
        search_available = False
        
        # Check if search service is responding
        for service_type, endpoints in ping_result.endpoints.items():
            if service_type == ServiceType.Search:
                for endpoint in endpoints:
                    if endpoint.state == PingState.OK:
                        search_available = True
                        print(f"Search service is responding at: {endpoint.remote}")
                        break
                break

        if not search_available:
            raise RuntimeError("Search/FTS service not found or not responding")
        
        print("Search service check passed successfully")
    except Exception as e:
        print(f"Health check failed: {str(e)}")
        raise
try:
    check_search_service(cluster)
except Exception as e:
    print(f"Failed to check search service: {str(e)}")
    raise

Search service is responding at: 127.0.0.1:8094
Search service check passed successfully


## Setting Up Collections in Couchbase

The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase:

1. Bucket Creation:
   - Checks if specified bucket exists, creates it if not
   - Sets bucket properties like RAM quota (1024MB) and replication (disabled)
   - Note: You will not be able to create a bucket on Capella

2. Scope Management:  
   - Verifies if requested scope exists within bucket
   - Creates new scope if needed (unless it's the default "_default" scope)

3. Collection Setup:
   - Checks for collection existence within scope
   - Creates collection if it doesn't exist
   - Waits 2 seconds for collection to be ready

Additional Tasks:
- Creates primary index on collection for query performance
- Clears any existing documents for clean state
- Implements comprehensive error handling and logging

The function is called twice to set up:
1. Main collection for vector embeddings
2. Cache collection for storing results


In [7]:
def setup_collection(cluster, bucket_name, scope_name, collection_name):
    try:
        # Check if bucket exists, create if it doesn't
        try:
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' exists.")
        except Exception as e:
            logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...")
            bucket_settings = CreateBucketSettings(
                name=bucket_name,
                bucket_type='couchbase',
                ram_quota_mb=1024,
                flush_enabled=True,
                num_replicas=0
            )
            cluster.buckets().create_bucket(bucket_settings)
            bucket = cluster.bucket(bucket_name)
            logging.info(f"Bucket '{bucket_name}' created successfully.")

        bucket_manager = bucket.collections()

        # Check if scope exists, create if it doesn't
        scopes = bucket_manager.get_all_scopes()
        scope_exists = any(scope.name == scope_name for scope in scopes)
        
        if not scope_exists and scope_name != "_default":
            logging.info(f"Scope '{scope_name}' does not exist. Creating it...")
            bucket_manager.create_scope(scope_name)
            logging.info(f"Scope '{scope_name}' created successfully.")

        # Check if collection exists, create if it doesn't
        collections = bucket_manager.get_all_scopes()
        collection_exists = any(
            scope.name == scope_name and collection_name in [col.name for col in scope.collections]
            for scope in collections
        )

        if not collection_exists:
            logging.info(f"Collection '{collection_name}' does not exist. Creating it...")
            bucket_manager.create_collection(scope_name, collection_name)
            logging.info(f"Collection '{collection_name}' created successfully.")
        else:
            logging.info(f"Collection '{collection_name}' already exists. Skipping creation.")

        # Wait for collection to be ready
        collection = bucket.scope(scope_name).collection(collection_name)
        time.sleep(2)  # Give the collection time to be ready for queries

        # Ensure primary index exists
        try:
            cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute()
            logging.info("Primary index present or created successfully.")
        except Exception as e:
            logging.warning(f"Error creating primary index: {str(e)}")

        # Clear all documents in the collection
        try:
            query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`"
            cluster.query(query).execute()
            logging.info("All documents cleared from the collection.")
        except Exception as e:
            logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.")

        return collection
    except Exception as e:
        raise RuntimeError(f"Error setting up collection: {str(e)}")
    
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)


2025-02-25 22:17:36 [INFO] Bucket 'vector-search-testing' exists.
2025-02-25 22:17:36 [INFO] Collection 'crew' does not exist. Creating it...
2025-02-25 22:17:36 [INFO] Collection 'crew' created successfully.
2025-02-25 22:17:41 [INFO] Primary index present or created successfully.
2025-02-25 22:17:41 [INFO] All documents cleared from the collection.


<couchbase.collection.Collection at 0x7fefd8d68410>

# Configuring and Initializing Couchbase Vector Search Index for Semantic Document Retrieval

Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase Vector Search Index comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity.

This CrewAI vector search index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `crew`. The configuration is set up for vectors with exactly `1536 dimensions`, using `dot product` similarity and optimized for `recall`. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly.

For more information on creating a vector search index, please follow the instructions at [Couchbase Vector Search Documentation](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html).

In [8]:
# Load index definition
try:
    with open('crew_index.json', 'r') as file:
        index_definition = json.load(file)
except FileNotFoundError as e:
    print(f"Error: crew_index.json file not found: {str(e)}")
    raise
except json.JSONDecodeError as e:
    print(f"Error: Invalid JSON in crew_index.json: {str(e)}")
    raise
except Exception as e:
    print(f"Error loading index definition: {str(e)}")
    raise

# Creating or Updating Search Indexes

With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine.

In [9]:
try:
    scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes()

    # Check if index already exists
    existing_indexes = scope_index_manager.get_all_indexes()
    index_name = index_definition["name"]

    if index_name in [index.name for index in existing_indexes]:
        logging.info(f"Index '{index_name}' found")
    else:
        logging.info(f"Creating new index '{index_name}'...")

    # Create SearchIndex object from JSON definition
    search_index = SearchIndex.from_json(index_definition)

    # Upsert the index (create if not exists, update if exists)
    scope_index_manager.upsert_index(search_index)
    logging.info(f"Index '{index_name}' successfully created/updated.")

except QueryIndexAlreadyExistsException:
    logging.info(f"Index '{index_name}' already exists. Skipping creation/update.")
except ServiceUnavailableException:
    raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.")
except InternalServerFailureException as e:
    logging.error(f"Internal server error: {str(e)}")
    raise

2025-02-25 22:17:41 [INFO] Creating new index 'vector_search_crew'...
2025-02-25 22:17:41 [INFO] Index 'vector_search_crew' successfully created/updated.


# Setting Up OpenAI Components

This section initializes two key OpenAI components needed for our RAG system:

1. OpenAI Embeddings:
   - Uses the 'text-embedding-3-small' model
   - Converts text into high-dimensional vector representations (embeddings)
   - These embeddings enable semantic search by capturing the meaning of text
   - Required for vector similarity search in Couchbase

2. ChatOpenAI Language Model:
   - Uses the 'gpt-4o' model
   - Temperature set to 0.0 for focused responses
   - Higher temperatures increase creativity and variation
   - Handles the actual text generation and responses
   - Acts as the brain of our RAG system for processing retrieved context

Both components require a valid OpenAI API key (OPENAI_API_KEY) for authentication.
The embeddings model is optimized for creating vector representations,
while the language model is optimized for understanding and generating human-like text.

In [10]:
# Initialize OpenAI components
embeddings = OpenAIEmbeddings(
    openai_api_key=OPENAI_API_KEY,
    model="text-embedding-3-small"
)

llm = ChatOpenAI(
    openai_api_key=OPENAI_API_KEY,
    model="gpt-4o",
    temperature=0.2
)

print("OpenAI components initialized")

OpenAI components initialized


# Setting Up the Couchbase Vector Store
A vector store is where we'll keep our embeddings. Unlike the FTS index, which is used for text-based search, the vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the search engine converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables our search engine to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used.

In [11]:
# Setup vector store
vector_store = CouchbaseVectorStore(
    cluster=cluster,
    bucket_name=CB_BUCKET_NAME,
    scope_name=SCOPE_NAME,
    collection_name=COLLECTION_NAME,
    embedding=embeddings,
    index_name=INDEX_NAME,
)
print("Vector store initialized")

Vector store initialized


# Load the BBC News Dataset
To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively.

The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version.

In [12]:
try:
    news_dataset = load_dataset(
        "RealTimeData/bbc_news_alltime", "2024-12", split="train"
    )
    print(f"Loaded the BBC News dataset with {len(news_dataset)} rows")
    logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.")
except Exception as e:
    raise ValueError(f"Error loading the BBC News dataset: {str(e)}")

2025-02-25 22:17:47 [INFO] Successfully loaded the BBC News dataset with 2687 rows.


Loaded the BBC News dataset with 2687 rows


## Cleaning up the Data
We will use the content of the news articles for our RAG system.

The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system.

In [13]:
news_articles = news_dataset["content"]
unique_articles = set()
for article in news_articles:
    if article:
        unique_articles.add(article)
unique_news_articles = list(unique_articles)
print(f"We have {len(unique_news_articles)} unique articles in our database.")

We have 1749 unique articles in our database.


## Saving Data to the Vector Store
To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process.

We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration.

This approach offers several benefits:
1. Memory Efficiency: Processing in smaller batches prevents memory overload
2. Error Handling: If an error occurs, only the current batch is affected
3. Progress Tracking: Easier to monitor and track the ingestion progress
4. Resource Management: Better control over CPU and network resource utilization

We use a conservative batch size of 100 to ensure reliable operation.
The optimal batch size depends on many factors including:
- Document sizes being inserted
- Available system resources
- Network conditions
- Concurrent workload

Consider measuring performance with your specific workload before adjusting.


In [14]:
batch_size = 100

# Automatic Batch Processing
articles = [article for article in unique_news_articles if article and len(article) <= 50000]

try:
    vector_store.add_texts(
        texts=articles,
        batch_size=batch_size
    )
    logging.info("Document ingestion completed successfully.")
except Exception as e:
    raise ValueError(f"Failed to save documents to vector store: {str(e)}")

2025-02-25 22:19:28 [INFO] Document ingestion completed successfully.


# Creating a Vector Search Tool
After loading our data into the vector store, we need to create a tool that can efficiently search through these vector embeddings. This involves two key components:

## Vector Retriever
The vector retriever is configured to perform similarity searches with specific parameters:
- k=8: Returns the 8 most similar documents
- fetch_k=20: Initially retrieves 20 candidates before filtering to the top 8
This two-stage approach helps balance between accuracy and performance.

## Search Tool
The search tool wraps the retriever in a user-friendly interface that:
- Accepts natural language queries
- Handles both string and structured query inputs
- Formats results with clear document separation
- Includes metadata for traceability

The tool is designed to integrate seamlessly with our AI agents, providing them with reliable access to our knowledge base through vector similarity search.


In [15]:
# Create vector retriever
retriever = vector_store.as_retriever(
    search_type="similarity",
    search_kwargs={
        "k": 8,
        "fetch_k": 20
    }
)

# Create search tool using retriever
search_tool = Tool(
    name="vector_search",
    func=lambda query: "\n\n".join([
        f"Document {i+1}:\n{'-'*40}\n{doc.page_content}"
        for i, doc in enumerate(retriever.invoke(
            query if isinstance(query, str) else str(query.get('query', ''))
        ))
    ]),
    description="""Search for relevant documents using vector similarity.
    Input should be a simple text query string.
    Returns a list of relevant document contents with metadata.
    Use this tool to find detailed information about topics."""
)

print("Vector search tool created")

Vector search tool created


# Creating CrewAI Agents

We'll create two specialized AI agents using the CrewAI framework to handle different aspects of our information retrieval and analysis system:

## Research Expert Agent
This agent is designed to:
- Execute semantic searches using our vector store
- Analyze and evaluate search results 
- Identify key information and insights
- Verify facts across multiple sources
- Synthesize findings into comprehensive research summaries

## Technical Writer Agent  
This agent is responsible for:
- Taking research findings and structuring them logically
- Converting technical concepts into clear explanations
- Ensuring proper citation and attribution
- Maintaining engaging yet informative tone
- Producing well-formatted final outputs

The agents work together in a coordinated way:
1. Research agent finds and analyzes relevant documents
2. Writer agent takes those findings and crafts polished responses
3. Both agents use a custom response template for consistent output

This multi-agent approach allows us to:
- Leverage specialized expertise for different tasks
- Maintain high quality through separation of concerns
- Create more comprehensive and reliable outputs
- Scale the system's capabilities efficiently

In [16]:
# Custom response template
response_template = """
Analysis Results
===============
{%- if .Response %}
{{ .Response }}
{%- endif %}

Sources
=======
{%- for tool in .Tools %}
* {{ tool.name }}
{%- endfor %}

Metadata
========
* Confidence: {{ .Confidence }}
* Analysis Time: {{ .ExecutionTime }}
"""

# Create research agent
researcher = Agent(
    role='Research Expert',
    goal='Find and analyze the most relevant documents to answer user queries accurately',
    backstory="""You are an expert researcher with deep knowledge in information retrieval 
    and analysis. Your expertise lies in finding, evaluating, and synthesizing information 
    from various sources. You have a keen eye for detail and can identify key insights 
    from complex documents. You always verify information across multiple sources and 
    provide comprehensive, accurate analyses.""",
    tools=[search_tool],
    llm=llm,
    verbose=True,
    memory=True,
    allow_delegation=False,
    response_template=response_template
)

# Create writer agent
writer = Agent(
    role='Technical Writer',
    goal='Generate clear, accurate, and well-structured responses based on research findings',
    backstory="""You are a skilled technical writer with expertise in making complex 
    information accessible and engaging. You excel at organizing information logically, 
    explaining technical concepts clearly, and creating well-structured documents. You 
    ensure all information is properly cited, accurate, and presented in a user-friendly 
    manner. You have a talent for maintaining the reader's interest while conveying 
    detailed technical information.""",
    llm=llm,
    verbose=True,
    memory=True,
    allow_delegation=False,
    response_template=response_template
)

print("Agents created successfully")

Agents created successfully


# Testing the Search System

Test the system with some example queries.

In [17]:
def process_query(query, researcher, writer):
    print(f"\nQuery: {query}")
    print("-" * 80)
    
    # Create tasks
    research_task = Task(
        description=f"Research and analyze information relevant to: {query}",
        agent=researcher,
        expected_output="A detailed analysis with key findings and supporting evidence"
    )
    
    writing_task = Task(
        description="Create a comprehensive and well-structured response",
        agent=writer,
        expected_output="A clear, comprehensive response that answers the query",
        context=[research_task]
    )
    
    # Create and execute crew
    crew = Crew(
        agents=[researcher, writer],
        tasks=[research_task, writing_task],
        process=Process.sequential,
        verbose=True,
        cache=True,
        planning=True
    )
    
    try:
        start_time = time.time()
        result = crew.kickoff()
        elapsed_time = time.time() - start_time
        
        print(f"\nQuery completed in {elapsed_time:.2f} seconds")
        print("=" * 80)
        print("RESPONSE")
        print("=" * 80)
        print(result)
        
        if hasattr(result, 'tasks_output'):
            print("\n" + "=" * 80)
            print("DETAILED TASK OUTPUTS")
            print("=" * 80)
            for task_output in result.tasks_output:
                print(f"\nTask: {task_output.description[:100]}...")
                print("-" * 40)
                print(f"Output: {task_output.raw}")
                print("-" * 40)
    except Exception as e:
        print(f"Error executing crew: {str(e)}")
        logging.error(f"Crew execution failed: {str(e)}", exc_info=True)

In [20]:
query = "What are the key details about the FA Cup third round draw? Include information about Manchester United vs Arsenal, Tamworth vs Tottenham, and other notable fixtures."
process_query(query, researcher, writer)



[92m22:24:56 - LiteLLM:INFO[0m: utils.py:2896 - 
LiteLLM completion() model= gpt-4o-mini; provider = openai
2025-02-25 22:24:56 [INFO] 
LiteLLM completion() model= gpt-4o-mini; provider = openai



Query: What are the key details about the FA Cup third round draw? Include information about Manchester United vs Arsenal, Tamworth vs Tottenham, and other notable fixtures.
--------------------------------------------------------------------------------
[1m[93m 
[2025-02-25 22:24:56][INFO]: Planning the crew execution[00m


[92m22:25:07 - LiteLLM:INFO[0m: utils.py:1084 - Wrapper: Completed Call, calling success_handler
2025-02-25 22:25:07 [INFO] Wrapper: Completed Call, calling success_handler
[92m22:25:07 - LiteLLM:INFO[0m: utils.py:2896 - 
LiteLLM completion() model= gpt-4o; provider = openai
2025-02-25 22:25:07 [INFO] 
LiteLLM completion() model= gpt-4o; provider = openai


[1m[95m# Agent:[00m [1m[92mResearch Expert[00m
[95m## Task:[00m [92mResearch and analyze information relevant to: What are the key details about the FA Cup third round draw? Include information about Manchester United vs Arsenal, Tamworth vs Tottenham, and other notable fixtures.1. Open the vector_search tool to find relevant documents. 2. Enter a well-crafted query such as 'FA Cup third round draw, key details, Manchester United vs Arsenal, Tamworth vs Tottenham.' 3. Execute the search to retrieve a list of relevant documents with metadata. 4. Review the search results carefully, focusing on content that specifically addresses: a) Key details about the FA Cup third round draw, b) Specific match information about Manchester United vs Arsenal, c) Specific match information about Tamworth vs Tottenham, d) Any other notable fixtures and their significance. 5. Take detailed notes on key findings and supporting evidence from the documents retrieved. 6. Organize the information logi

[92m22:25:09 - LiteLLM:INFO[0m: utils.py:1084 - Wrapper: Completed Call, calling success_handler
2025-02-25 22:25:09 [INFO] Wrapper: Completed Call, calling success_handler
[92m22:25:09 - LiteLLM:INFO[0m: utils.py:2896 - 
LiteLLM completion() model= gpt-4o; provider = openai
2025-02-25 22:25:09 [INFO] 
LiteLLM completion() model= gpt-4o; provider = openai




[1m[95m# Agent:[00m [1m[92mResearch Expert[00m
[95m## Thought:[00m [92mI need to gather detailed information about the FA Cup third round draw, focusing on specific matches and notable fixtures. To do this, I will use the vector_search tool to find relevant documents.[00m
[95m## Using tool:[00m [92mvector_search[00m
[95m## Tool Input:[00m [92m
"{\"query\": \"FA Cup third round draw, key details, Manchester United vs Arsenal, Tamworth vs Tottenham\"}"[00m
[95m## Tool Output:[00m [92m
Document 1:
----------------------------------------
'Life is not easy' - Haaland penalty miss sums up Man City crisis

Manchester City striker Erling Haaland has now missed two of his 17 penalties taken in the Premier League

Nothing seems to be going Manchester City's way at the moment - and star striker Erling Haaland is not a happy man. If there was any player currently in the Premier League you would hand the ball to for a penalty to win a match, it would be the prolific Norwegia

[92m22:25:11 - LiteLLM:INFO[0m: utils.py:1084 - Wrapper: Completed Call, calling success_handler
2025-02-25 22:25:11 [INFO] Wrapper: Completed Call, calling success_handler
[92m22:25:12 - LiteLLM:INFO[0m: utils.py:2896 - 
LiteLLM completion() model= gpt-4o; provider = openai
2025-02-25 22:25:12 [INFO] 
LiteLLM completion() model= gpt-4o; provider = openai




[1m[95m# Agent:[00m [1m[92mResearch Expert[00m
[95m## Using tool:[00m [92mvector_search[00m
[95m## Tool Input:[00m [92m
"{\"query\": \"FA Cup third round draw 2023, Manchester United vs Arsenal, Tamworth vs Tottenham, notable fixtures\"}"[00m
[95m## Tool Output:[00m [92m
Document 1:
----------------------------------------
The club is now working with the council to improve its standards

The discovery of mouse droppings at Manchester United's Old Trafford stadium has led inspectors to downgrade the football club's food hygiene rating. Though no droppings were found in the actual kitchens, a council environmental health officer slashed the club's rating down from four to two out of five, following a visit in November. The club said a "robust pest-control system" was in place at the ground and included "multiple weekly checks" of all catering areas. Trafford Council said it was working closely with the club so "necessary improvements" were made.

The Food Standards Ag

[92m22:25:14 - LiteLLM:INFO[0m: utils.py:1084 - Wrapper: Completed Call, calling success_handler
2025-02-25 22:25:14 [INFO] Wrapper: Completed Call, calling success_handler
[92m22:25:15 - LiteLLM:INFO[0m: utils.py:2896 - 
LiteLLM completion() model= gpt-4o; provider = openai
2025-02-25 22:25:15 [INFO] 
LiteLLM completion() model= gpt-4o; provider = openai




[1m[95m# Agent:[00m [1m[92mResearch Expert[00m
[95m## Using tool:[00m [92mvector_search[00m
[95m## Tool Input:[00m [92m
"{\"query\": \"FA Cup third round draw 2023 details, Manchester United vs Arsenal, Tamworth vs Tottenham, notable fixtures\"}"[00m
[95m## Tool Output:[00m [92m
Document 1:
----------------------------------------
Uefa 'not worried' at Euro 2025 clash with Club World Cup

England won Euro 2022 on home soil beating Germany in the final at Wembley

Uefa is "not worried" that some Women's Euro 2025 matches will clash with the Fifa men's Club World Cup, says managing director Nadine Kessler. Fifa's new expanded men's tournament takes place in the United States from 15 June to 13 July and involves 32 teams, including Chelsea and Manchester City. Switzerland is hosting Euro 2025 from 2-27 July, meaning five or six matches are likely to be played at the same time as some in the men's competition. "Overall, I'm not really worried. We're talking about two tou

[92m22:25:19 - LiteLLM:INFO[0m: utils.py:1084 - Wrapper: Completed Call, calling success_handler
2025-02-25 22:25:19 [INFO] Wrapper: Completed Call, calling success_handler
[92m22:25:20 - LiteLLM:INFO[0m: utils.py:2896 - 
LiteLLM completion() model= gpt-4o; provider = openai
2025-02-25 22:25:20 [INFO] 
LiteLLM completion() model= gpt-4o; provider = openai




[1m[95m# Agent:[00m [1m[92mResearch Expert[00m
[95m## Using tool:[00m [92mvector_search[00m
[95m## Tool Input:[00m [92m
"{\"query\": \"FA Cup third round draw 2023, Manchester United Arsenal, Tamworth Tottenham, fixtures\"}"[00m
[95m## Tool Output:[00m [92m
Document 1:
----------------------------------------
The club is now working with the council to improve its standards

The discovery of mouse droppings at Manchester United's Old Trafford stadium has led inspectors to downgrade the football club's food hygiene rating. Though no droppings were found in the actual kitchens, a council environmental health officer slashed the club's rating down from four to two out of five, following a visit in November. The club said a "robust pest-control system" was in place at the ground and included "multiple weekly checks" of all catering areas. Trafford Council said it was working closely with the club so "necessary improvements" were made.

The Food Standards Agency rating sc

[92m22:25:24 - LiteLLM:INFO[0m: utils.py:1084 - Wrapper: Completed Call, calling success_handler
2025-02-25 22:25:24 [INFO] Wrapper: Completed Call, calling success_handler
[92m22:25:24 - LiteLLM:INFO[0m: utils.py:2896 - 
LiteLLM completion() model= gpt-4o; provider = openai
2025-02-25 22:25:24 [INFO] 
LiteLLM completion() model= gpt-4o; provider = openai




[1m[95m# Agent:[00m [1m[92mResearch Expert[00m
[95m## Final Answer:[00m [92m
I was unable to find relevant information about the FA Cup third round draw, including details on Manchester United vs Arsenal, Tamworth vs Tottenham, and other notable fixtures. The search results did not contain the necessary details.
```[00m


[1m[95m# Agent:[00m [1m[92mTechnical Writer[00m
[95m## Task:[00m [92mCreate a comprehensive and well-structured response1. Begin by reviewing the findings from the previous task, ensuring all notes are clear and comprehensive. 2. Outline the final response structure, which should include sections such as: a) Introduction to the FA Cup third round, b) Overview of all notable fixtures with emphasis on Manchester United vs Arsenal and Tamworth vs Tottenham, c) Analysis of key details and implications of these matches, d) Conclusion summarizing the importance of the third round draw. 3. Using the outline, draft the response in a clear and precise mann

[92m22:25:31 - LiteLLM:INFO[0m: utils.py:1084 - Wrapper: Completed Call, calling success_handler
2025-02-25 22:25:31 [INFO] Wrapper: Completed Call, calling success_handler




[1m[95m# Agent:[00m [1m[92mTechnical Writer[00m
[95m## Final Answer:[00m [92m
**Introduction to the FA Cup Third Round**

The FA Cup, known for its rich history and tradition, is one of the most prestigious domestic cup competitions in English football. The third round of the FA Cup is particularly significant as it marks the entry of Premier League and Championship clubs into the tournament. This stage often brings thrilling encounters and the potential for giant-killing acts, where lower-league teams have the opportunity to upset top-tier clubs.

**Overview of Notable Fixtures**

The third round draw often features several intriguing matchups, and this year is no exception. Among the standout fixtures are Manchester United vs Arsenal and Tamworth vs Tottenham. These matches not only promise exciting football but also carry historical and competitive significance.

- **Manchester United vs Arsenal**: This fixture is a classic rivalry in English football, with both teams boa

## Conclusion
By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and CrewAI. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine.