diff --git a/tutorial/markdown/generated/vector-search-cookbook/Bedrock_Agents_Custom_Control.md b/tutorial/markdown/generated/vector-search-cookbook/Bedrock_Agents_Custom_Control.md deleted file mode 100644 index 172a03d..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/Bedrock_Agents_Custom_Control.md +++ /dev/null @@ -1,952 +0,0 @@ ---- -path: "/tutorial-aws-bedrock-agents-custom-control" -title: Building Intelligent Agents with AWS Bedrock (Custom Control) -short_title: AWS Bedrock Agents Custom Control Approach -description: - - Learn how to build intelligent agents using Amazon Bedrock Agents with a custom control approach and Couchbase as the vector store. - - This tutorial demonstrates how to create specialized agents that can process documents and interact with external APIs using custom control flows. - - You'll understand how to implement secure multi-agent architectures using Amazon Bedrock's agent capabilities with fine-grained control over agent behavior. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - Amazon Bedrock -sdk_language: - - python -length: 90 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/awsbedrock-agents/custom-control-approach/Bedrock_Agents_Custom_Control.ipynb) - -# AWS Bedrock Agents with Couchbase Vector Search - Custom Control Approach - -This notebook demonstrates the Custom Control approach for implementing AWS Bedrock agents with Couchbase Vector Search. In this approach, the agent returns control to the application for function execution. - -We'll implement a multi-agent architecture with specialized agents for different tasks: -- **Researcher Agent**: Searches for relevant documents in the vector store -- **Writer Agent**: Formats and presents the research findings - -## Alternative Approaches - -This notebook demonstrates the Custom Control approach for AWS Bedrock Agents. For comparison, you might also want to check out the Lambda Approach, which uses AWS Lambda functions to execute agent tools instead of handling them directly in your application code. - -The Lambda approach offers better separation of concerns and scalability, but requires more setup. You can find that implementation here: [Lambda Approach Notebook](https://developer.couchbase.com/tutorial-aws-bedrock-agents-lambda) -Note: If the link above doesn't work in your Jupyter environment, you can navigate to the file manually in the `awsbedrock-agents/lambda-approach/` directory. - -## Overview - -The Custom Control approach gives the application invoking the agent the responsibility of executing the agent's defined functions (tools). When the agent decides to use a tool, it sends a `returnControl` event back to the calling application, which then executes the function locally and (optionally) returns the result to the agent to continue processing. - -## Key Steps & Concepts - -1. **Define Agent:** - * Define instructions (prompt) for the agent. - * Define the function schema (tools the agent can use, e.g., `researcher_functions`, `writer_functions` in the example). - -2. **Create Agent in Bedrock:** - * Use `bedrock_agent_client.create_agent` to create the agent, providing the instructions and foundation model. - * The example's `create_agent` function includes logic to check for existing agents and potentially delete/recreate them if they are in a non-functional state. - -3. **Create Action Group (Custom Control):** - * Use `bedrock_agent_client.create_agent_action_group`. - * Crucially, set the `actionGroupExecutor` to `{"customControl": "RETURN_CONTROL"}`. This tells Bedrock to pause execution and return control to the caller when a function in this group needs to be run. - * Provide the `functionSchema` defined earlier. - -4. **Prepare Agent:** - * Use `bedrock_agent_client.prepare_agent` to make the agent ready for invocation. - * The `wait_for_agent_status` utility function polls until the agent reaches a `PREPARED` or `Available` state. - -5. **Create Agent Alias:** - * An alias (e.g., "v1") is created using `bedrock_agent_client.create_agent_alias` for invoking the agent. - - -6. **Invoke Agent & Handle Return Control (Custom Control Flow)** - - When the application invokes a Bedrock agent and the agent decides to use a tool, the "Custom Control" mechanism takes effect. Instead of Bedrock running the tool, it sends a `returnControl` event back to the application. The code then parses this event to identify the requested function and its parameters, executes that function locally using the application's resources (like a vector store), and the result of this local execution becomes the final output for that specific agent interaction. If further steps are needed with another agent, a new, separate agent invocation is made using this output. - - * Application calls `invoke_agent` to interact with Bedrock. - * Agent signals tool use via a `returnControl` event in the response stream. - * Application parses the event, extracting function name and parameters. - * Application executes the specified function locally, accessing its own resources. - * **The output from this local function execution is the final result for that agent's turn.** - - - -## Pros - -* **Full Control:** The application has complete control over the execution environment and logic of the tools. -* **Direct State Access:** Tools can directly access application memory, state, and resources (like the `vector_store` object in the example) without needing separate deployment or complex configuration passing. -* **Simpler Local Development:** Can be easier to test and debug locally as the tool execution happens within the same process. -* **Flexibility:** Allows integration with any library or service available to the application. - -## Cons - -* **Application Burden:** The application code is responsible for implementing and executing the tool logic. -* **Scalability:** The scalability of tool execution is tied to the scalability of the application itself. -* **Tighter Coupling:** The agent's functionality is more tightly coupled with the application code. -* **Interaction Model:** The specific implementation shown requires chaining separate agent invocations rather than letting the agent continue processing within a single turn after a tool is used. Implementing the latter (returning results via `ReturnControl`) adds complexity to the application's handling of the `invoke_agent` response/request cycle. - -## Setup and Configuration - -First, let's import the necessary libraries and set up our environment: - - -```python -import json -import logging -import os -import time -import uuid -from datetime import timedelta - -import boto3 -from botocore.exceptions import ClientError -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (InternalServerFailureException, - QueryIndexAlreadyExistsException, - ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from dotenv import load_dotenv -from langchain_aws import BedrockEmbeddings -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore - -# Setup logging -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') -``` - -## Load Environment Variables - -Load environment variables from the .env file. Make sure to create a .env file with the necessary credentials before running this notebook. - - -```python -# Load environment variables -load_dotenv() - -# Couchbase Configuration -CB_HOST = os.getenv("CB_HOST", "couchbase://localhost") -CB_USERNAME = os.getenv("CB_USERNAME", "Administrator") -CB_PASSWORD = os.getenv("CB_PASSWORD", "password") -CB_BUCKET_NAME = os.getenv("CB_BUCKET_NAME", "vector-search-testing") -SCOPE_NAME = os.getenv("SCOPE_NAME", "shared") -COLLECTION_NAME = os.getenv("COLLECTION_NAME", "bedrock") -INDEX_NAME = os.getenv("INDEX_NAME", "vector_search_bedrock") - -# AWS Configuration -AWS_REGION = os.getenv("AWS_REGION", "us-east-1") -AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID") -AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY") -AWS_ACCOUNT_ID = os.getenv("AWS_ACCOUNT_ID") - -# Check if required environment variables are set -required_vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY"] -missing_vars = [var for var in required_vars if not os.getenv(var)] -if missing_vars: - logging.warning(f"Missing required environment variables: {', '.join(missing_vars)}") - logging.warning("Please set these variables in your .env file") -else: - logging.info("All required environment variables are set") -``` - - 2025-05-08 13:34:15,605 - INFO - All required environment variables are set - - -## Initialize AWS Clients - -Set up the AWS clients for Bedrock and other services: - - -```python -# Initialize AWS session -session = boto3.Session( - aws_access_key_id=AWS_ACCESS_KEY_ID, - aws_secret_access_key=AWS_SECRET_ACCESS_KEY, - region_name=AWS_REGION -) - -# Initialize AWS clients from session -bedrock_client = session.client('bedrock') -bedrock_agent_client = session.client('bedrock-agent') -bedrock_runtime = session.client('bedrock-runtime') -bedrock_runtime_client = session.client('bedrock-agent-runtime') -iam_client = session.client('iam') - -logging.info("AWS clients initialized successfully") -``` - - 2025-05-08 13:34:15,836 - INFO - AWS clients initialized successfully - - -## Set Up Couchbase and Vector Store - -Now let's set up the Couchbase connection, collections, and vector store: - - -```python -# Connect to Couchbase -auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) -options = ClusterOptions(auth) -cluster = Cluster(CB_HOST, options) -cluster.wait_until_ready(timedelta(seconds=5)) -logging.info("Successfully connected to Couchbase") -``` - - 2025-05-08 13:34:17,966 - INFO - Successfully connected to Couchbase - - - - -## Create Couchbase Bucket, Scope, and Collection - -The following code block ensures that the necessary Couchbase bucket, scope, and collection are available. -It will create them if they don't exist, and also clear any existing documents from the collection to start fresh. - -> Note: Bucket Creation will fail on Capella - - -```python -# Set up collection -try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(CB_BUCKET_NAME) - logging.info(f"Bucket '{CB_BUCKET_NAME}' exists.") - except Exception as e: - logging.info(f"Bucket '{CB_BUCKET_NAME}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=CB_BUCKET_NAME, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - bucket = cluster.bucket(CB_BUCKET_NAME) - logging.info(f"Bucket '{CB_BUCKET_NAME}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == SCOPE_NAME for scope in scopes) - - if not scope_exists and SCOPE_NAME != "_default": - logging.info(f"Scope '{SCOPE_NAME}' does not exist. Creating it...") - bucket_manager.create_scope(SCOPE_NAME) - logging.info(f"Scope '{SCOPE_NAME}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == SCOPE_NAME and COLLECTION_NAME in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{COLLECTION_NAME}' does not exist. Creating it...") - bucket_manager.create_collection(SCOPE_NAME, COLLECTION_NAME) - logging.info(f"Collection '{COLLECTION_NAME}' created successfully.") - else: - logging.info(f"Collection '{COLLECTION_NAME}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(SCOPE_NAME).collection(COLLECTION_NAME) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{CB_BUCKET_NAME}`.`{SCOPE_NAME}`.`{COLLECTION_NAME}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.error(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{CB_BUCKET_NAME}`.`{SCOPE_NAME}`.`{COLLECTION_NAME}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - -except Exception as e: - logging.error(f"Error setting up collection: {str(e)}") - raise -``` - - 2025-05-08 13:34:19,133 - INFO - Bucket 'vector-search-testing' exists. - 2025-05-08 13:34:21,149 - INFO - Collection 'bedrock' already exists. Skipping creation. - 2025-05-08 13:34:24,304 - INFO - Primary index present or created successfully. - 2025-05-08 13:34:24,529 - INFO - All documents cleared from the collection. - - -## Configure Couchbase Search Index - -This section focuses on setting up the Couchbase Search Index, which is essential for enabling vector search capabilities. -* The code will load an index definition from a local JSON file named `aws_index.json`. -* **Important Note:** The provided `aws_index.json` file has hardcoded references for the bucket, scope, and collection names. If you have used different names for your bucket, scope, or collection than the defaults specified in this notebook or your `.env` file, you **must** modify the `aws_index.json` file to reflect your custom names before running the next cell. - - -```python -# Set up search indexes -try: - # Construct path relative to the script file - # In a notebook, __file__ is not defined, so use os.getcwd() instead - script_dir = os.getcwd() - index_file_path = os.path.join(script_dir, 'aws_index.json') - # Load index definition from file - with open(index_file_path, 'r') as file: - index_definition = json.load(file) - logging.info(f"Loaded index definition from aws_index.json") -except Exception as e: - logging.error(f"Error loading index definition: {str(e)}") - raise - -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") -except ServiceUnavailableException: - logging.error("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.") -except InternalServerFailureException as e: - logging.error(f"Internal server error: {str(e)}") - raise -``` - - 2025-05-08 13:34:24,537 - INFO - Loaded index definition from aws_index.json - 2025-05-08 13:34:25,659 - INFO - Index 'vector_search_bedrock' found - 2025-05-08 13:34:26,348 - INFO - Index 'vector_search_bedrock' already exists. Skipping creation/update. - - - -```python -# Initialize Bedrock runtime client for embeddings -embeddings = BedrockEmbeddings( - client=bedrock_runtime, - model_id="amazon.titan-embed-text-v2:0" -) -logging.info("Successfully created Bedrock embeddings client") - -# Initialize vector store -vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME -) -logging.info("Successfully created vector store") -``` - - 2025-05-08 13:34:26,353 - INFO - Successfully created Bedrock embeddings client - 2025-05-08 13:34:29,660 - INFO - Successfully created vector store - - -# Load Documents into Vector Store - -Let's load the documents from the documents.json file and add them to our vector store: ->Note: `documents.json` contains the documents that we want to load into our vector store. As an example, we have added a few documents to the file from [https://cline.bot/](https://cline.bot/) - - -```python -# Load documents from JSON file -try: - # In a notebook, __file__ is not defined, so use os.getcwd() instead - script_dir = os.getcwd() - documents_file_path = os.path.join(script_dir, 'documents.json') - with open(documents_file_path, 'r') as f: - data = json.load(f) - documents = data.get('documents', []) - logging.info(f"Loaded {len(documents)} documents from documents.json") -except Exception as e: - logging.error(f"Error loading documents: {str(e)}") - raise - -# Add documents to vector store -logging.info(f"Adding {len(documents)} documents to vector store...") -for i, doc in enumerate(documents, 1): - text = doc.get('text', '') - metadata = doc.get('metadata', {}) - - # Ensure metadata is a dictionary before adding - if isinstance(metadata, str): - try: - metadata = json.loads(metadata) - except json.JSONDecodeError: - logging.warning(f"Warning: Could not parse metadata for document {i}. Using empty metadata.") - metadata = {} - elif not isinstance(metadata, dict): - logging.warning(f"Warning: Metadata for document {i} is not a dict or valid JSON string. Using empty metadata.") - metadata = {} - - doc_id = vector_store.add_texts([text], [metadata])[0] - logging.info(f"Added document {i}/{len(documents)} with ID: {doc_id}") - - # Add small delay between requests - time.sleep(1) - -logging.info(f"\nProcessing complete: {len(documents)}/{len(documents)} documents added successfully") -``` - - 2025-05-08 13:34:29,670 - INFO - Loaded 7 documents from documents.json - 2025-05-08 13:34:29,670 - INFO - Adding 7 documents to vector store... - 2025-05-08 13:34:31,637 - INFO - Added document 1/7 with ID: 884e8caae84545aa9e4735538b38f373 - 2025-05-08 13:34:33,211 - INFO - Added document 2/7 with ID: 61b9d4c9c5ee42a8a51e44ef0b55942a - 2025-05-08 13:34:34,784 - INFO - Added document 3/7 with ID: c7cb7541a9004ead83b9b393bc44a9b5 - 2025-05-08 13:34:36,886 - INFO - Added document 4/7 with ID: c8b07eae2e3a42c1a8114397bc8bfa67 - 2025-05-08 13:34:38,534 - INFO - Added document 5/7 with ID: a4356e0801564ad1b2f3ccdf05284375 - 2025-05-08 13:34:40,129 - INFO - Added document 6/7 with ID: 647d0fddba8f4bb38fd66d291a669bb2 - 2025-05-08 13:34:42,140 - INFO - Added document 7/7 with ID: 3b57038a7a234992927756cb3307738f - 2025-05-08 13:34:43,142 - INFO - - Processing complete: 7/7 documents added successfully - - -## Custom Control Approach Implementation - -Now let's implement the Custom Control approach for Bedrock agents. In this approach, the agent returns control to the application for function execution. - - -```python -# Function to wait for agent status -def wait_for_agent_status(bedrock_agent_client, agent_id, target_statuses=['Available', 'PREPARED', 'NOT_PREPARED'], max_attempts=30, delay=2): - """Wait for agent to reach any of the target statuses""" - for attempt in range(max_attempts): - try: - response = bedrock_agent_client.get_agent(agentId=agent_id) - current_status = response['agent']['agentStatus'] - - if current_status in target_statuses: - logging.info(f"Agent {agent_id} reached status: {current_status}") - return current_status - elif current_status == 'FAILED': - logging.error(f"Agent {agent_id} failed") - return 'FAILED' - - logging.info(f"Agent status: {current_status}, waiting... (attempt {attempt + 1}/{max_attempts})") - time.sleep(delay) - - except Exception as e: - logging.error(f"Error checking agent status: {str(e)}") - time.sleep(delay) - - return current_status -``` - - -```python -# Function to create a Bedrock agent with Custom Control action groups -def create_agent(bedrock_agent_client, name, instructions, functions, model_id="amazon.nova-pro-v1:0", agent_role_arn=None): - """Create a Bedrock agent with Custom Control action groups""" - try: - # List existing agents - existing_agents = bedrock_agent_client.list_agents() - existing_agent = next( - (agent for agent in existing_agents['agentSummaries'] - if agent['agentName'] == name), - None - ) - - # Handle existing agent - if existing_agent: - agent_id = existing_agent['agentId'] - logging.info(f"Found existing agent '{name}' with ID: {agent_id}") - - # Check agent status - response = bedrock_agent_client.get_agent(agentId=agent_id) - status = response['agent']['agentStatus'] - - if status in ['NOT_PREPARED', 'FAILED']: - logging.info(f"Deleting agent '{name}' with status {status}") - bedrock_agent_client.delete_agent(agentId=agent_id) - time.sleep(10) # Wait after deletion - existing_agent = None - - # Create new agent if needed - if not existing_agent: - logging.info(f"Creating new agent '{name}'") - agent_params = { - "agentName": name, - "description": f"{name.title()} agent for document operations", - "instruction": instructions, - "idleSessionTTLInSeconds": 1800, - "foundationModel": model_id - } - - if agent_role_arn: - agent_params["agentResourceRoleArn"] = agent_role_arn - - agent = bedrock_agent_client.create_agent(**agent_params) - agent_id = agent['agent']['agentId'] - logging.info(f"Created new agent '{name}' with ID: {agent_id}") - else: - agent_id = existing_agent['agentId'] - - # Wait for initial creation if needed - status = wait_for_agent_status(bedrock_agent_client, agent_id, target_statuses=['NOT_PREPARED', 'PREPARED', 'Available']) - if status not in ['NOT_PREPARED', 'PREPARED', 'Available']: - raise Exception(f"Agent failed to reach valid state: {status}") - - # Create action group if needed - try: - bedrock_agent_client.create_agent_action_group( - agentId=agent_id, - agentVersion="DRAFT", - actionGroupExecutor={"customControl": "RETURN_CONTROL"}, # This is the key for Custom Control - actionGroupName=f"{name}_actions", - functionSchema={"functions": functions}, - description=f"Action group for {name} operations" - ) - logging.info(f"Created action group for agent '{name}'") - time.sleep(5) - except bedrock_agent_client.exceptions.ConflictException: - logging.info(f"Action group already exists for agent '{name}'") - - # Prepare agent if needed - if status == 'NOT_PREPARED': - try: - logging.info(f"Starting preparation for agent '{name}'") - bedrock_agent_client.prepare_agent(agentId=agent_id) - status = wait_for_agent_status( - bedrock_agent_client, - agent_id, - target_statuses=['PREPARED', 'Available'] - ) - logging.info(f"Agent '{name}' preparation completed with status: {status}") - except Exception as e: - logging.error(f"Error during preparation: {str(e)}") - - # Handle alias creation/retrieval - try: - aliases = bedrock_agent_client.list_agent_aliases(agentId=agent_id) - alias = next((a for a in aliases['agentAliasSummaries'] if a['agentAliasName'] == 'v1'), None) - - if not alias: - logging.info(f"Creating new alias for agent '{name}'") - alias = bedrock_agent_client.create_agent_alias( - agentId=agent_id, - agentAliasName="v1" - ) - alias_id = alias['agentAlias']['agentAliasId'] - else: - alias_id = alias['agentAliasId'] - logging.info(f"Using existing alias for agent '{name}'") - - logging.info(f"Successfully configured agent '{name}' with ID: {agent_id} and alias: {alias_id}") - return agent_id, alias_id - - except Exception as e: - logging.error(f"Error managing alias: {str(e)}") - raise - - except Exception as e: - logging.error(f"Error creating/updating agent: {str(e)}") - raise -``` - - -```python -# Function to invoke a Bedrock agent -def invoke_agent(bedrock_runtime_client, agent_id, alias_id, input_text, session_id=None, vector_store=None): - """Invoke a Bedrock agent""" - if session_id is None: - session_id = str(uuid.uuid4()) - - try: - logging.info(f"Invoking agent with input: {input_text}") - - response = bedrock_runtime_client.invoke_agent( - agentId=agent_id, - agentAliasId=alias_id, - sessionId=session_id, - inputText=input_text, - enableTrace=True - ) - - result = "" - - for event in response['completion']: - # Process text chunks - if 'chunk' in event: - chunk = event['chunk']['bytes'].decode('utf-8') - result += chunk - - # Handle custom control return - if 'returnControl' in event: - return_control = event['returnControl'] - invocation_inputs = return_control.get('invocationInputs', []) - - if invocation_inputs: - function_input = invocation_inputs[0].get('functionInvocationInput', {}) - action_group = function_input.get('actionGroup') - function_name = function_input.get('function') - parameters = function_input.get('parameters', []) - - # Convert parameters to a dictionary - param_dict = {} - for param in parameters: - param_dict[param.get('name')] = param.get('value') - - logging.info(f"Function call: {action_group}::{function_name}") - - # Handle search_documents function - if function_name == 'search_documents': - query = param_dict.get('query') - k = int(param_dict.get('k', 3)) - - logging.info(f"Searching for: {query}, k={k}") - - if vector_store: - # Perform the search - docs = vector_store.similarity_search(query, k=k) - - # Format results - search_results = [doc.page_content for doc in docs] - logging.info(f"Found {len(search_results)} results") - - # Format the response - result = f"Search results for '{query}':\n\n" - for i, content in enumerate(search_results): - result += f"Result {i+1}: {content}\n\n" - else: - logging.error("Vector store not available") - result = "Error: Vector store not available" - - # Handle format_content function - elif function_name == 'format_content': - content = param_dict.get('content') - style = param_dict.get('style', 'user-friendly') - - logging.info(f"Formatting content in {style} style") - - # Check if content is valid - if content and content != '?': - result = f"Formatted in {style} style: {content}" - else: - result = "No content provided to format." - else: - logging.error(f"Unknown function: {function_name}") - result = f"Error: Unknown function {function_name}" - - if not result.strip(): - logging.warning("Received empty response from agent") - - return result - - except Exception as e: - logging.error(f"Error invoking agent: {str(e)}") - raise RuntimeError(f"Failed to invoke agent: {str(e)}") -``` - -## Define Agent Instructions and Functions - -Now let's define the instructions and functions for our agents: - - -```python -# Researcher agent instructions and functions -researcher_instructions = """ -You are a Research Assistant that helps users find relevant information in documents. -Your capabilities include: -1. Searching through documents using semantic similarity -2. Providing relevant document excerpts -3. Answering questions based on document content -""" - -researcher_functions = [ - { - "name": "search_documents", - "description": "Search for relevant documents using semantic similarity", - "parameters": { - "query": { - "type": "string", - "description": "The search query", - "required": True - }, - "k": { - "type": "integer", - "description": "Number of results to return", - "required": False - } - }, - "requireConfirmation": "DISABLED" - } -] - -# Writer agent instructions and functions -writer_instructions = """ -You are a Content Writer Assistant that helps format and present research findings. -Your capabilities include: -1. Formatting research findings in a user-friendly way -2. Creating clear and engaging summaries -3. Organizing information logically -4. Highlighting key insights -""" - -writer_functions = [ - { - "name": "format_content", - "description": "Format and present research findings", - "parameters": { - "content": { - "type": "string", - "description": "The research findings to format", - "required": True - }, - "style": { - "type": "string", - "description": "The desired presentation style (e.g., summary, detailed, bullet points)", - "required": False - } - }, - "requireConfirmation": "DISABLED" - } -] -``` - -## Run Custom Control Approach - -Now let's run the Custom Control approach with our agents: - - -```python -# Get or Create IAM Role -agent_role_name = "BedrockExecutionRoleForAgents_CustomControl" -trust_policy = { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "Service": "bedrock.amazonaws.com" - }, - "Action": "sts:AssumeRole" - } - ] -} -policy_arn_to_attach = "arn:aws:iam::aws:policy/AmazonBedrockFullAccess" - -try: - role_response = iam_client.get_role(RoleName=agent_role_name) - agent_role_arn = role_response['Role']['Arn'] - logging.info(f"Found existing IAM role '{agent_role_name}' with ARN: {agent_role_arn}") -except ClientError as e: - if e.response['Error']['Code'] == 'NoSuchEntity': - logging.info(f"IAM role '{agent_role_name}' not found. Creating...") - try: - role_response = iam_client.create_role( - RoleName=agent_role_name, - AssumeRolePolicyDocument=json.dumps(trust_policy), - Description="IAM role for Bedrock Agents execution" - ) - agent_role_arn = role_response['Role']['Arn'] - logging.info(f"Created IAM role '{agent_role_name}' with ARN: {agent_role_arn}") - # Wait a bit for the role to be fully available before attaching policy - time.sleep(10) - except ClientError as create_error: - logging.error(f"Error creating IAM role '{agent_role_name}': {create_error}") - agent_role_arn = None - else: - logging.error(f"Error getting IAM role '{agent_role_name}': {e}") - agent_role_arn = None - -# Attach the policy if not already attached -if agent_role_arn: - try: - attached_policies = iam_client.list_attached_role_policies(RoleName=agent_role_name) - if not any(p['PolicyArn'] == policy_arn_to_attach for p in attached_policies.get('AttachedPolicies', [])): - logging.info(f"Attaching policy '{policy_arn_to_attach}' to role '{agent_role_name}'...") - iam_client.attach_role_policy( - RoleName=agent_role_name, - PolicyArn=policy_arn_to_attach - ) - logging.info(f"Policy '{policy_arn_to_attach}' attached successfully.") - # Wait a bit for the policy attachment to propagate - time.sleep(5) - else: - logging.info(f"Policy '{policy_arn_to_attach}' already attached to role '{agent_role_name}'.") - except ClientError as attach_error: - logging.warning(f"Error attaching policy to role '{agent_role_name}': {attach_error}") -``` - - 2025-05-08 13:34:44,254 - INFO - Found existing IAM role 'BedrockExecutionRoleForAgents_CustomControl' with ARN: arn:aws:iam::598307997273:role/BedrockExecutionRoleForAgents_CustomControl - 2025-05-08 13:34:44,547 - INFO - Policy 'arn:aws:iam::aws:policy/AmazonBedrockFullAccess' already attached to role 'BedrockExecutionRoleForAgents_CustomControl'. - - - -```python -# Create researcher agent -researcher_id = None -researcher_alias = None - -if agent_role_arn: - try: - researcher_id, researcher_alias = create_agent( - bedrock_agent_client, - "researcher", - researcher_instructions, - researcher_functions, - agent_role_arn=agent_role_arn - ) - logging.info(f"Researcher agent created with ID: {researcher_id} and alias: {researcher_alias}") - except Exception as e: - logging.error(f"Failed to create researcher agent: {str(e)}") -else: - logging.error("No agent role ARN available for researcher agent creation") -``` - - 2025-05-08 13:34:45,303 - INFO - Found existing agent 'researcher' with ID: FF1OSFJIJF - 2025-05-08 13:34:46,399 - INFO - Agent FF1OSFJIJF reached status: PREPARED - 2025-05-08 13:34:46,712 - INFO - Action group already exists for agent 'researcher' - 2025-05-08 13:34:46,996 - INFO - Using existing alias for agent 'researcher' - 2025-05-08 13:34:46,997 - INFO - Successfully configured agent 'researcher' with ID: FF1OSFJIJF and alias: RQVFGLBCZP - 2025-05-08 13:34:46,997 - INFO - Researcher agent created with ID: FF1OSFJIJF and alias: RQVFGLBCZP - - - -```python -# Create writer agent -writer_id = None -writer_alias = None - -if agent_role_arn: - try: - writer_id, writer_alias = create_agent( - bedrock_agent_client, - "writer", - writer_instructions, - writer_functions, - agent_role_arn=agent_role_arn - ) - logging.info(f"Writer agent created with ID: {writer_id} and alias: {writer_alias}") - except Exception as e: - logging.error(f"Failed to create writer agent: {str(e)}") -else: - logging.error("No agent role ARN available for writer agent creation") - -if not any([researcher_id, writer_id]): - # Adjust error message based on whether role setup failed - if not agent_role_arn: - raise RuntimeError("Failed to create agents because IAM role setup failed.") - else: - raise RuntimeError("Failed to create any agents despite successful IAM role setup.") -``` - - 2025-05-08 13:34:47,279 - INFO - Found existing agent 'writer' with ID: JDA8S8SRS1 - 2025-05-08 13:34:48,178 - INFO - Agent JDA8S8SRS1 reached status: PREPARED - 2025-05-08 13:34:48,498 - INFO - Action group already exists for agent 'writer' - 2025-05-08 13:34:48,797 - INFO - Using existing alias for agent 'writer' - 2025-05-08 13:34:48,797 - INFO - Successfully configured agent 'writer' with ID: JDA8S8SRS1 and alias: 3SFKJGSGNQ - 2025-05-08 13:34:48,798 - INFO - Writer agent created with ID: JDA8S8SRS1 and alias: 3SFKJGSGNQ - - -## Test the Agents - -Let's test our agents by asking the researcher agent to search for information and the writer agent to format the results: - - -```python -# Test researcher agent -if researcher_id and researcher_alias: - researcher_response = invoke_agent( - bedrock_runtime_client, - researcher_id, - researcher_alias, - "What is unique about the Cline AI assistant? Use the search_documents function to find relevant information.", - vector_store=vector_store - ) - print("\nResearcher Response:\n", researcher_response) -else: - logging.error("Researcher agent not available for testing") -``` - - 2025-05-08 13:34:48,808 - INFO - Invoking agent with input: What is unique about the Cline AI assistant? Use the search_documents function to find relevant information. - 2025-05-08 13:34:51,478 - INFO - Function call: researcher_actions::search_documents - 2025-05-08 13:34:51,478 - INFO - Searching for: What is unique about the Cline AI assistant?, k=3 - 2025-05-08 13:34:52,791 - INFO - Found 3 results - - - - Researcher Response: - Search results for 'What is unique about the Cline AI assistant?': - - Result 1: The Cline AI assistant, developed by Saoud Rizwan, is a unique system that combines vector search capabilities with Amazon Bedrock agents. Unlike traditional chatbots, it uses a sophisticated multi-agent architecture where specialized agents handle different aspects of document processing and interaction. - - Result 2: One of Cline's key features is its ability to create MCP (Model Context Protocol) servers on the fly. This allows users to extend the system's capabilities by adding new tools and resources that connect to external APIs, all while maintaining a secure and non-interactive environment. - - Result 3: The browser automation capabilities in Cline are implemented through Puppeteer, allowing the system to interact with web interfaces in a controlled 900x600 pixel window. This enables testing of web applications, verification of changes, and even general web browsing tasks. - - - - - -```python -# Test writer agent -if writer_id and writer_alias and "researcher_response" in locals(): - writer_response = invoke_agent( - bedrock_runtime_client, - writer_id, - writer_alias, - f"Format this research finding using the format_content function: {researcher_response}", - vector_store=vector_store - ) - print("\nWriter Response:\n", writer_response) -else: - logging.error("Writer agent not available for testing or no researcher response to format") -``` - - 2025-05-08 13:34:52,798 - INFO - Invoking agent with input: Format this research finding using the format_content function: Search results for 'What is unique about the Cline AI assistant?': - - Result 1: The Cline AI assistant, developed by Saoud Rizwan, is a unique system that combines vector search capabilities with Amazon Bedrock agents. Unlike traditional chatbots, it uses a sophisticated multi-agent architecture where specialized agents handle different aspects of document processing and interaction. - - Result 2: One of Cline's key features is its ability to create MCP (Model Context Protocol) servers on the fly. This allows users to extend the system's capabilities by adding new tools and resources that connect to external APIs, all while maintaining a secure and non-interactive environment. - - Result 3: The browser automation capabilities in Cline are implemented through Puppeteer, allowing the system to interact with web interfaces in a controlled 900x600 pixel window. This enables testing of web applications, verification of changes, and even general web browsing tasks. - - - 2025-05-08 13:34:55,730 - INFO - Function call: writer_actions::format_content - 2025-05-08 13:34:55,730 - INFO - Formatting content in summary style - - - - Writer Response: - Formatted in summary style: The Cline AI assistant, developed by Saoud Rizwan, is a unique system that combines vector search capabilities with Amazon Bedrock agents. Unlike traditional chatbots, it uses a sophisticated multi-agent architecture where specialized agents handle different aspects of document processing and interaction. One of Cline's key features is its ability to create MCP (Model Context Protocol) servers on the fly. This allows users to extend the system's capabilities by adding new tools and resources that connect to external APIs, all while maintaining a secure and non-interactive environment. The browser automation capabilities in Cline are implemented through Puppeteer, allowing the system to interact with web interfaces in a controlled 900x600 pixel window. This enables testing of web applications, verification of changes, and even general web browsing tasks. - - -## Conclusion - -In this notebook, we've demonstrated the Custom Control approach for implementing AWS Bedrock agents with Couchbase Vector Search. This approach allows the agent to return control to the application for function execution, providing more flexibility and control over the agent's behavior. - -Key components of this implementation include: - -1. **Vector Store Setup**: We set up a Couchbase vector store to store and search documents using semantic similarity. -2. **Agent Creation**: We created two specialized agents - a researcher agent for searching documents and a writer agent for formatting results. -3. **Custom Control**: We implemented the Custom Control approach, where the agent returns control to the application for function execution. -4. **Function Handling**: We handled the agent's function calls in the application code, allowing for more control and flexibility. - -This approach is particularly useful when you need more control over the agent's behavior or when you want to integrate the agent with existing systems and data sources. diff --git a/tutorial/markdown/generated/vector-search-cookbook/Bedrock_Agents_Lambda.md b/tutorial/markdown/generated/vector-search-cookbook/Bedrock_Agents_Lambda.md deleted file mode 100644 index 731a620..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/Bedrock_Agents_Lambda.md +++ /dev/null @@ -1,2165 +0,0 @@ ---- -# formatter.md -path: "/tutorial-aws-bedrock-agents-lambda" -title: Building Intelligent Agents with Amazon Bedrock (Lambda) -short_title: AWS Bedrock Agents Lambda Approach -description: - - Learn how to build intelligent agents using Amazon Bedrock Agents with AWS Lambda and Couchbase as the vector store. - - This tutorial demonstrates how to create specialized agents that can process documents and interact with external APIs using serverless Lambda functions. - - You'll understand how to implement secure multi-agent architectures using Amazon Bedrock's agent capabilities with a Lambda-based approach. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - Amazon Bedrock -sdk_language: - - python -length: 90 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/awsbedrock-agents/lambda-approach/Bedrock_Agents_Lambda.ipynb) - -# AWS Bedrock Agents with Couchbase Vector Search - Lambda Approach - -This notebook demonstrates the Lambda approach for implementing AWS Bedrock agents with Couchbase Vector Search. In this approach, the agent invokes AWS Lambda functions to execute operations. - -We'll implement a multi-agent architecture with specialized agents for different tasks: -- **Researcher Agent**: Searches for relevant documents in the vector store -- **Writer Agent**: Formats and presents the research findings - -## Alternative Approaches - -This notebook demonstrates the Lambda Approach for AWS Bedrock Agents. For comparison, you might also want to check out the Custom Control Approach, which handles agent tools directly in your application code instead of using AWS Lambda functions. - -The Custom Control approach offers simpler setup and more direct control, but may not scale as well. You can find that implementation here: [Custom Control Approach Notebook](https://developer.couchbase.com/tutorial-aws-bedrock-agents-custom-control) - -Note: If the link above doesn't work in your Jupyter environment, you can navigate to the file manually in the `awsbedrock-agents/custom-control-approach/` directory. - -## Overview - -The Lambda approach for AWS Bedrock Agents delegates the execution of an agent's defined functions (tools) to backend AWS Lambda functions. When the agent decides to use a tool, Bedrock directly invokes the corresponding Lambda function that you've specified in the agent's action group configuration. This Lambda function receives the parameters from the agent, executes the necessary logic (e.g., querying a Couchbase vector store, calling other APIs, performing computations), and then returns the result to the Bedrock Agent. The agent can then use this result to continue its reasoning process or formulate a final response to the user. This architecture promotes a clean separation of concerns, allows tool logic to be developed and scaled independently, and leverages the serverless capabilities of AWS Lambda. - -## Key Steps & Concepts - -1. **Define Agent Instructions & Tool Schema:** - * **Instructions:** Craft a clear prompt that tells the agent its purpose, capabilities, and how it should behave (e.g., "You are a research assistant that uses the SearchAndFormat tool..."). - * **Function Schema:** Define the structure of the tool(s) the agent can use. In this notebook, we define a single tool (e.g., `searchAndFormatDocuments`) that the agent will call. This schema specifies the function name, description, and its input parameters (e.g., `query`, `k`, `style`). This schema acts as the contract between the agent and the Lambda function. - -2. **Implement Lambda Handler Function:** - * Create an AWS Lambda function (e.g., `bedrock_agent_search_and_format.py`) that contains the actual Python code to execute the tool's logic. - * **Event Handling:** The Lambda handler receives an event payload from Bedrock. This payload includes details like the API path (which corresponds to the function name in the schema), HTTP method, and the parameters supplied by the agent. - * **Business Logic:** Inside the Lambda, parse the incoming event, extract parameters, and perform the required actions. For this notebook, this involves: - * Connecting to Couchbase. - * Initializing the Bedrock Embeddings client. - * Performing a vector similarity search using the provided query and `k` value. - * Optionally, formatting the search results based on the `style` parameter (though in this specific example, the formatting is largely illustrative and the LLM does the heavy lifting of presentation). - * **Response Structure:** The Lambda must return a JSON response in a specific format that Bedrock expects. This response typically includes the `actionGroup`, `apiPath`, `httpMethod`, `httpStatusCode`, and a `responseBody` containing the result of the tool execution (e.g., the search results as a string). - * **Deployment:** Package the Lambda function with its dependencies (e.g., `requirements.txt`) into a .zip file. This notebook includes helper functions to automate packaging (using a `Makefile`) and deployment, including uploading to S3 if the package is large. The Lambda also needs an IAM role with permissions to run, write logs, and interact with Bedrock and any other required AWS services. - * **Environment Variables:** The Lambda function is configured with environment variables (e.g., Couchbase connection details, Bedrock model IDs) to allow it to connect to necessary services without hardcoding credentials. These are set during the Lambda creation/update process in the notebook. - -3. **Create Agent in AWS Bedrock:** - * Use the `bedrock_agent_client.create_agent` SDK call. Provide the agent name, the ARN of the IAM role it will assume, the foundation model ID (e.g., Claude Sonnet), and the instructions defined in step 1. - -4. **Create Agent Action Group (Linking to Lambda):** - * Use `bedrock_agent_client.create_agent_action_group`. - * **`actionGroupExecutor`:** This is the crucial part for the Lambda approach. Set it to `{'lambda': 'arn:aws:lambda:::function:'}`. This tells Bedrock to invoke your specific Lambda function when this action group is triggered. - * **`functionSchema`:** Provide the function schema defined in step 1. This allows the agent to understand how to call the Lambda function (i.e., what parameters to send). - * Give the action group a name (e.g., `SearchAndFormatActionGroup`). - -5. **Prepare Agent:** - * Call `bedrock_agent_client.prepare_agent` with the `agentId`. This makes the DRAFT version of the agent (with its newly configured action group) ready for use. The notebook includes a custom waiter to poll until the agent status is `PREPARED`. - -6. **Create or Update Agent Alias:** - * An alias (e.g., `prod`) is used to invoke a specific version of the agent. The notebook checks if an alias exists and creates one if not, pointing to the latest prepared (DRAFT) version. Use `bedrock_agent_client.create_agent_alias` or `update_agent_alias`. - -7. **Invoke Agent:** - * Use `bedrock_agent_runtime_client.invoke_agent`, providing the `agentId`, `agentAliasId`, a unique `sessionId`, and the user's `inputText` (prompt). - * Bedrock takes over: when the agent decides to use the tool from the action group, Bedrock transparently calls the configured Lambda function with the necessary parameters. - * The Lambda executes, returns its result to the agent, and the agent uses this result to generate its final response. - * Your application code simply waits for and processes the final streaming response from the `invoke_agent` call. Unlike the Custom Control approach, there's no `returnControl` event for the application to handle for tool execution; Bedrock manages the Lambda invocation directly. - -## Pros - -* **Decoupling & Modularity:** Tool execution logic is encapsulated within Lambda functions, separate from the main application code. This allows for independent development, deployment, and scaling of tools. -* **Scalability & Serverless:** Leverages the inherent scalability, concurrency, and pay-per-use benefits of AWS Lambda for tool execution. -* **Managed Execution Environment:** AWS manages the underlying infrastructure, runtime environment, and invocation mechanism for the Lambda functions. -* **Simpler Application-Level Code:** The application that invokes the Bedrock agent doesn't need to implement the tool's logic itself or handle `returnControl` events for function execution, as Bedrock orchestrates the call to Lambda directly. - -## Cons - -* **Deployment & Configuration Overhead:** Requires setting up, packaging, configuring dependencies, and deploying separate Lambda functions. IAM roles and permissions for Lambdas also need careful management. -* **State Management:** If tools need to share state or complex context with the Lambda function, this must be explicitly passed, often via environment variables or by including necessary lookup logic within the Lambda itself. -* **Cold Starts:** AWS Lambda cold starts can introduce latency the first time a function is invoked after a period of inactivity, potentially affecting agent response time. -* **Debugging Complexity:** Troubleshooting can be more involved as it spans across the Bedrock Agent service, the Lambda service, and potentially other services the Lambda interacts with (like Couchbase). Centralized logging (e.g., CloudWatch Logs for Lambda) is essential. -* **Cost:** Incurs costs associated with Lambda invocations, execution duration, and any resources used by the Lambda (e.g., data transfer, provisioned concurrency if used). - -## 1. Imports - -This section imports all necessary Python libraries. These include: -- Standard libraries: `json` for data handling, `logging` for progress and error messages, `os` for interacting with the operating system (e.g., file paths), `subprocess` for running external commands (like `make` for Lambda packaging), `time` for delays, `traceback` for detailed error reporting, `uuid` for generating unique identifiers, and `shutil` for file operations. -- `boto3` and `botocore`: The AWS SDK for Python, used to interact with AWS services like Bedrock, IAM, Lambda, and S3. Specific configurations (`Config`) and waiters are also imported for robust client interactions. -- `couchbase`: The official Couchbase SDK for Python, used for connecting to and interacting with the Couchbase cluster, including managing buckets, collections, and search indexes. Specific exception classes are imported for error handling. -- `dotenv`: For loading environment variables from a `.env` file, which helps manage configuration settings like API keys and connection strings securely. -- `langchain_aws` and `langchain_couchbase`: Libraries from the LangChain ecosystem. `BedrockEmbeddings` is used to generate text embeddings via Amazon Bedrock, and `CouchbaseSearchVectorStore` provides an interface for using Couchbase as a vector store in LangChain applications. - - -```python -import json -import logging -import os -import shutil -import subprocess -import time -import traceback -import uuid -from datetime import timedelta - -import boto3 -from botocore.config import Config -from botocore.exceptions import ClientError -from botocore.waiter import WaiterModel, create_waiter_with_client -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (BucketNotFoundException, CouchbaseException, - QueryIndexAlreadyExistsException) -from couchbase.management.buckets import BucketSettings, BucketType -from couchbase.management.collections import CollectionSpec -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from dotenv import load_dotenv -from langchain_aws import BedrockEmbeddings -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -``` - -## 2. Configuration - -This section handles the initial setup of essential configurations for the notebook: -- **Logging:** Configures the `logging` module to output messages with a specific format (timestamp, level, message), which helps in tracking the script's execution and diagnosing issues. -- **Environment Variables:** Attempts to load environment variables from a `.env` file located either in the current directory or the parent directory. This is a common practice to keep sensitive information like credentials and hostnames out of the codebase. If the `.env` file is not found, the script will rely on variables already set in the execution environment. -- **Couchbase Settings:** Defines variables for connecting to Couchbase, including the host, username, password, and the names for the bucket, scope, collection, and search index that will be used for this experiment. Default values are provided if specific environment variables are not set. -- **AWS Settings:** Defines variables for AWS configuration, such as the region, access key ID, secret access key, and AWS account ID. These are crucial for `boto3` to interact with AWS services. -- **Bedrock Model IDs:** Specifies the model identifiers for the Amazon Bedrock text embedding model (e.g., `amazon.titan-embed-text-v2:0`) and the foundation model to be used by the agent (e.g., `anthropic.claude-3-sonnet-20240229-v1:0`). -- **File Paths:** Sets up variables for various file paths used throughout the notebook, such as the directory for schemas, the path to the Couchbase search index JSON definition, and the path to the JSON file containing documents to be loaded into the vector store. Using `os.getcwd()` makes these paths relative to the notebook's current working directory. - - -```python -# Setup logging -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s') -logger = logging.getLogger(__name__) - -# Load environment variables from project root .env -# In a notebook environment, '__file__' is not defined. Use a relative path or absolute path directly. -# Assuming the notebook is run from the 'lambda-experiments' directory -dotenv_path = os.path.join(os.getcwd(), '.env') # Or specify the full path if needed -logger.info(f"Attempting to load .env file from: {dotenv_path}") -if os.path.exists(dotenv_path): - load_dotenv(dotenv_path=dotenv_path) - logger.info(".env file loaded successfully.") -else: - # Try loading from parent directory if not found in current - parent_dotenv_path = os.path.join(os.path.dirname(os.getcwd()), '.env') - if os.path.exists(parent_dotenv_path): - load_dotenv(dotenv_path=parent_dotenv_path) - logger.info(f".env file loaded successfully from parent directory: {parent_dotenv_path}") - else: - logger.warning(f".env file not found at {dotenv_path} or {parent_dotenv_path}. Relying on environment variables.") - - -# Couchbase Configuration -CB_HOST = os.getenv("CB_HOST", "couchbase://localhost") -CB_USERNAME = os.getenv("CB_USERNAME", "Administrator") -CB_PASSWORD = os.getenv("CB_PASSWORD", "password") -# Using a new bucket/scope/collection for experiments to avoid conflicts -CB_BUCKET_NAME = os.getenv("CB_BUCKET_NAME", "vector-search-exp") -SCOPE_NAME = os.getenv("SCOPE_NAME", "bedrock_exp") -COLLECTION_NAME = os.getenv("COLLECTION_NAME", "docs_exp") -INDEX_NAME = os.getenv("INDEX_NAME", "vector_search_bedrock_exp") - -# AWS Configuration -AWS_REGION = os.getenv("AWS_REGION", "us-east-1") -AWS_ACCESS_KEY_ID = os.getenv("AWS_ACCESS_KEY_ID") -AWS_SECRET_ACCESS_KEY = os.getenv("AWS_SECRET_ACCESS_KEY") -AWS_ACCOUNT_ID = os.getenv("AWS_ACCOUNT_ID") - -# Bedrock Model IDs -EMBEDDING_MODEL_ID = "amazon.titan-embed-text-v2:0" -AGENT_MODEL_ID = "anthropic.claude-3-sonnet-20240229-v1:0" # Using Sonnet for the agent - -# Paths (relative to the notebook's execution directory) -SCRIPT_DIR = os.getcwd() # Use current working directory for notebook context -SCHEMAS_DIR = os.path.join(SCRIPT_DIR, 'schemas') # New Schemas Dir -SEARCH_FORMAT_SCHEMA_PATH = os.path.join(SCHEMAS_DIR, 'search_and_format_schema.json') # Added -INDEX_JSON_PATH = os.path.join(SCRIPT_DIR, 'aws_index.json') # Keep -DOCS_JSON_PATH = os.path.join(SCRIPT_DIR, 'documents.json') # Changed to load from script's directory -``` - - 2025-06-09 13:39:41,393 - INFO - Attempting to load .env file from: /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/.env - 2025-06-09 13:39:41,395 - INFO - .env file loaded successfully. - - -## 3. Helper Functions - -This section defines a comprehensive suite of helper functions to modularize the various operations required throughout the notebook. These functions encapsulate specific tasks, making the main execution flow cleaner and easier to understand. The categories of helper functions include: - -* **Environment and Client Initialization:** Checking for necessary environment variables and setting up AWS SDK (`boto3`) clients for services like IAM, Lambda, Bedrock, and S3. -* **Couchbase Interaction:** Connecting to the Couchbase cluster, and robustly setting up buckets, scopes, collections, and search indexes. Includes functions to clear data from collections for clean experimental runs. -* **IAM Role Management:** Creating or retrieving the necessary IAM roles with appropriate trust policies and permissions that allow Bedrock Agents and Lambda functions to operate and interact with other AWS services securely. -* **Lambda Function Deployment:** A set of functions to manage the lifecycle of the Lambda function that the agent will invoke. This includes packaging the Lambda code and its dependencies (using a `Makefile`), uploading the deployment package (to S3 if it's large), creating or updating the Lambda function in AWS, and deleting it for cleanup. -* **Bedrock Agent Resource Management:** Functions for creating the Bedrock Agent itself, defining its action groups (which link the agent to the Lambda function via its ARN and define the tool schema), preparing the agent to make it invocable, and managing agent aliases. Also includes functions to delete these agent resources for cleanup. -* **Agent Invocation:** A function to test the fully configured agent by sending it a prompt and processing its streamed response, including any trace information for debugging. - -### 3.1 check_environment_variables - -This function verifies that all critical environment variables required for the script to run (e.g., AWS credentials, Couchbase password, AWS Account ID) are set. It logs an error and returns `False` if any are missing, otherwise logs success and returns `True`. - - -```python -def check_environment_variables(): - """Check if required environment variables are set.""" - required_vars = ["AWS_ACCESS_KEY_ID", "AWS_SECRET_ACCESS_KEY", "AWS_ACCOUNT_ID", "CB_PASSWORD"] - missing_vars = [var for var in required_vars if not os.getenv(var)] - if missing_vars: - logger.error(f"Missing required environment variables: {', '.join(missing_vars)}") - logger.error("Please set these variables in your environment or .env file") - return False - logger.info("All required environment variables are set.") - return True -``` - -### 3.2 initialize_aws_clients - -This function sets up and returns the necessary AWS SDK (`boto3`) clients for interacting with various AWS services. It initializes clients for Bedrock Runtime (for embeddings and agent invocation), IAM (for managing roles and policies), Lambda (for deploying and managing Lambda functions), Bedrock Agent (for creating and managing agents), and Bedrock Agent Runtime (for invoking agents). It uses credentials and region from the environment configuration and includes a custom configuration (`agent_config`) with longer timeouts and retries, which is particularly important for Bedrock Agent operations that can take more time, like agent preparation. - - -```python -def initialize_aws_clients(): - """Initialize required AWS clients.""" - try: - logger.info(f"Initializing AWS clients in region: {AWS_REGION}") - session = boto3.Session( - aws_access_key_id=AWS_ACCESS_KEY_ID, - aws_secret_access_key=AWS_SECRET_ACCESS_KEY, - region_name=AWS_REGION - ) - # Use a config with longer timeouts for agent operations - agent_config = Config( - connect_timeout=120, - read_timeout=600, # Agent preparation can take time - retries={'max_attempts': 5, 'mode': 'adaptive'} - ) - bedrock_runtime = session.client('bedrock-runtime', region_name=AWS_REGION) - iam_client = session.client('iam', region_name=AWS_REGION) - lambda_client = session.client('lambda', region_name=AWS_REGION) - bedrock_agent_client = session.client('bedrock-agent', region_name=AWS_REGION, config=agent_config) # Add agent client - bedrock_agent_runtime_client = session.client('bedrock-agent-runtime', region_name=AWS_REGION, config=agent_config) # Add agent runtime client - logger.info("AWS clients initialized successfully.") - return bedrock_runtime, iam_client, lambda_client, bedrock_agent_client, bedrock_agent_runtime_client # Return agent runtime client - except Exception as e: - logger.error(f"Error initializing AWS clients: {e}") - raise -``` - -### 3.3 connect_couchbase - -This function establishes a connection to the Couchbase cluster using the connection string (`CB_HOST`), username, and password from the environment configuration. It uses `PasswordAuthenticator` for authentication and `ClusterOptions` for potentially customizing connection parameters (though commented out in the example, it shows where timeouts could be set). It waits for the cluster to be ready before returning the `Cluster` object, ensuring that subsequent operations can be performed reliably. - - -```python -def connect_couchbase(): - """Connect to Couchbase cluster.""" - try: - logger.info(f"Connecting to Couchbase cluster at {CB_HOST}...") - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - # Use robust options - options = ClusterOptions( - auth, - ) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=10)) # Wait longer if needed - logger.info("Successfully connected to Couchbase.") - return cluster - except CouchbaseException as e: - logger.error(f"Couchbase connection error: {e}") - raise - except Exception as e: - logger.error(f"Unexpected error connecting to Couchbase: {e}") - raise -``` - -### 3.4 setup_collection - -This comprehensive function is responsible for ensuring that the required Couchbase bucket, scope, and collection are available for the agent's vector store. It performs the following steps idempotently: -- Checks if the specified bucket (`bucket_name`) exists. If not, it creates the bucket with defined settings (e.g., RAM quota, flush enabled). It includes a pause to allow the bucket to become ready. -- Checks if the specified scope (`scope_name`) exists within the bucket. If not, it creates the scope and includes a brief pause. -- Checks if the specified collection (`collection_name`) exists within the scope. If not, it creates the collection using a `CollectionSpec` and pauses. -- Ensures that a primary N1QL index exists on the collection, creating it if it's missing. This is often useful for administrative queries or simpler lookups, though not strictly for vector search itself. -Finally, it returns a `Collection` object representing the target collection for further operations. - -> Note: Bucket Creation will not work on Capella. - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - """Set up Couchbase collection (Original Logic from lamda-approach)""" - logger.info(f"Setting up collection: {bucket_name}/{scope_name}/{collection_name}") - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logger.info(f"Bucket '{bucket_name}' exists.") - except BucketNotFoundException: - logger.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - # Use BucketSettings with potentially lower RAM for experiment - bucket_settings = BucketSettings( - name=bucket_name, - bucket_type=BucketType.COUCHBASE, - ram_quota_mb=256, # Adjusted from 1024 - flush_enabled=True, - num_replicas=0 - ) - try: - cluster.buckets().create_bucket(bucket_settings) - # Wait longer after bucket creation - logger.info(f"Bucket '{bucket_name}' created. Waiting for ready state (10s)...") - time.sleep(10) - bucket = cluster.bucket(bucket_name) # Re-assign bucket object - except Exception as create_e: - logger.error(f"Failed to create bucket '{bucket_name}': {create_e}") - raise - except Exception as e: - logger.error(f"Error getting bucket '{bucket_name}': {e}") - raise - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(s.name == scope_name for s in scopes) - - if not scope_exists: - logger.info(f"Scope '{scope_name}' does not exist. Creating it...") - try: - bucket_manager.create_scope(scope_name) - logger.info(f"Scope '{scope_name}' created. Waiting (2s)...") - time.sleep(2) - except CouchbaseException as e: - # Handle potential race condition or already exists error more robustly - if "already exists" in str(e).lower() or "scope_exists" in str(e).lower(): - logger.info(f"Scope '{scope_name}' likely already exists (caught during creation attempt).") - else: - logger.error(f"Failed to create scope '{scope_name}': {e}") - raise - else: - logger.info(f"Scope '{scope_name}' already exists.") - - # Check if collection exists, create if it doesn't - # Re-fetch scopes in case it was just created - scopes = bucket_manager.get_all_scopes() - collection_exists = False - for s in scopes: - if s.name == scope_name: - if any(c.name == collection_name for c in s.collections): - collection_exists = True - break - - if not collection_exists: - logger.info(f"Collection '{collection_name}' does not exist in scope '{scope_name}'. Creating it...") - try: - # Use CollectionSpec - collection_spec = CollectionSpec(collection_name, scope_name) - bucket_manager.create_collection(collection_spec) - logger.info(f"Collection '{collection_name}' created. Waiting (2s)...") - time.sleep(2) - except CouchbaseException as e: - if "already exists" in str(e).lower() or "collection_exists" in str(e).lower(): - logger.info(f"Collection '{collection_name}' likely already exists (caught during creation attempt).") - else: - logger.error(f"Failed to create collection '{collection_name}': {e}") - raise - else: - logger.info(f"Collection '{collection_name}' already exists.") - - # Ensure primary index exists - try: - logger.info(f"Ensuring primary index exists on `{bucket_name}`.`{scope_name}`.`{collection_name}`...") - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logger.info("Primary index present or created successfully.") - except Exception as e: - logger.error(f"Error creating primary index: {str(e)}") - # Decide if this is fatal - - logger.info("Collection setup complete.") - # Return the collection object for use - return cluster.bucket(bucket_name).scope(scope_name).collection(collection_name) - - except Exception as e: - logger.error(f"Error setting up collection: {str(e)}") - logger.error(traceback.format_exc()) - raise -``` - -### 3.5 setup_search_index - -This function is responsible for creating or updating the Couchbase Search (FTS) index required for vector similarity search. Key operations include: -- Loading the index definition from a specified JSON file (`index_definition_path`). -- Dynamically updating the loaded index definition to use the correct `index_name` and `sourceName` (bucket name) provided as arguments. This allows for a template index definition file to be reused. -- Using the `SearchIndexManager` (obtained from the cluster object) to `upsert_index`. Upserting means the index will be created if it doesn't exist, or updated if an index with the same name already exists. This makes the operation idempotent. -- After submitting the upsert operation, it includes a pause (`time.sleep`) to allow Couchbase some time to start the indexing process in the background. - -> **Important Note:** The provided `aws_index.json` file has hardcoded references for the bucket, scope, and collection names. If you have used different names for your bucket, scope, or collection than the defaults specified in this notebook or your `.env` file, you **must** modify the `aws_index.json` file to reflect your custom names before running the next cell. - - -```python -def setup_search_index(cluster, index_name, bucket_name, scope_name, collection_name, index_definition_path): - """Set up search indexes (Original Logic, adapted) """ - try: - logger.info(f"Looking for index definition at: {index_definition_path}") - if not os.path.exists(index_definition_path): - logger.error(f"Index definition file not found: {index_definition_path}") - raise FileNotFoundError(f"Index definition file not found: {index_definition_path}") - - with open(index_definition_path, 'r') as file: - index_definition = json.load(file) - index_definition['name'] = index_name - index_definition['sourceName'] = bucket_name - logger.info(f"Loaded index definition from {index_definition_path}, ensuring name is '{index_name}' and source is '{bucket_name}'.") - - except Exception as e: - logger.error(f"Error loading index definition: {str(e)}") - raise - - try: - # Use the SearchIndexManager from the Cluster object for cluster-level indexes - # Or use scope-level if the index JSON is structured for that - # Assuming cluster level based on original script structure for upsert - search_index_manager = cluster.search_indexes() - - # Create SearchIndex object from potentially modified JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - logger.info(f"Upserting search index '{index_name}'...") - search_index_manager.upsert_index(search_index) - - # Wait for indexing - logger.info(f"Index '{index_name}' upsert operation submitted. Waiting for indexing (10s)...") - time.sleep(10) - - logger.info(f"Search index '{index_name}' setup complete.") - - except QueryIndexAlreadyExistsException: - # This exception might not be correct for SearchIndexManager - # Upsert should handle exists cases, but log potential specific errors - logger.warning(f"Search index '{index_name}' likely already existed (caught QueryIndexAlreadyExistsException, check if applicable). Upsert attempted.") - except CouchbaseException as e: - logger.error(f"Couchbase error during search index setup for '{index_name}': {e}") - raise - except Exception as e: - logger.error(f"Unexpected error during search index setup for '{index_name}': {e}") - raise -``` - -### 3.6 clear_collection - -This utility function is used to delete all documents from a specified Couchbase collection. It constructs and executes a N1QL `DELETE` query targeting the given bucket, scope, and collection. This is useful for ensuring a clean state before loading new data for an experiment, preventing interference from previous runs. It also attempts to log the number of mutations (deleted documents) if the query metrics are available. - - -```python -def clear_collection(cluster, bucket_name, scope_name, collection_name): - """Delete all documents from the specified collection (Original Logic).""" - try: - logger.warning(f"Attempting to clear all documents from `{bucket_name}`.`{scope_name}`.`{collection_name}`...") - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - result = cluster.query(query).execute() - # Try to get mutation count, handle if not available - mutation_count = 0 - try: - metrics_data = result.meta_data().metrics() - if metrics_data: - mutation_count = metrics_data.mutation_count() - except Exception as metrics_e: - logger.warning(f"Could not retrieve mutation count after delete: {metrics_e}") - logger.info(f"Successfully cleared documents from the collection (approx. {mutation_count} mutations).") - except Exception as e: - logger.error(f"Error clearing documents from collection: {e}. Collection might be empty or index not ready.") -``` - -### 3.7 create_agent_role - -This function creates or updates the necessary IAM (Identity and Access Management) role that the Bedrock Agent and its associated Lambda function will assume. The role needs permissions to interact with AWS services on your behalf. Key aspects of this function are: -- **Assume Role Policy:** Defines which AWS services (principals) are allowed to assume this role. In this case, it allows both `lambda.amazonaws.com` (for the Lambda function execution) and `bedrock.amazonaws.com` (for the Bedrock Agent service itself). -- **Idempotency:** It first checks if a role with the specified `role_name` already exists. - - If it exists, the function retrieves its ARN and updates its trust policy to ensure it matches the required configuration. - - If it doesn't exist, it creates a new IAM role with the defined assume role policy and description. -- **Permissions Policies:** - - Attaches the AWS managed policy `AWSLambdaBasicExecutionRole`, which grants the Lambda function permissions to write logs to CloudWatch. - - Creates and attaches an inline policy (`LambdaBasicLoggingPermissions`) for more specific logging permissions if needed, scoped to the Lambda log group. - - Creates and attaches an inline policy (`BedrockAgentPermissions`) granting broad `bedrock:*` permissions. For production, these permissions should be scoped down to the minimum required. -- **Propagation Delays:** Includes `time.sleep` calls after creating the role and after attaching policies to allow time for the changes to propagate within AWS, which helps prevent subsequent operations from failing due to eventual consistency issues. -It returns the ARN (Amazon Resource Name) of the created or updated IAM role, which is then used when creating the Bedrock Agent and the Lambda function. - - -```python -def create_agent_role(iam_client, role_name, aws_account_id): - """Creates or gets the IAM role for the Bedrock Agent Lambda functions.""" - logger.info(f"Checking/Creating IAM role: {role_name}") - assume_role_policy_document = { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Principal": { - "Service": [ - "lambda.amazonaws.com", - "bedrock.amazonaws.com" - ] - }, - "Action": "sts:AssumeRole" - } - ] - } - - role_arn = None - try: - # Check if role exists - get_role_response = iam_client.get_role(RoleName=role_name) - role_arn = get_role_response['Role']['Arn'] - logger.info(f"IAM role '{role_name}' already exists with ARN: {role_arn}") - - # Ensure trust policy is up-to-date - logger.info(f"Updating trust policy for existing role '{role_name}'...") - iam_client.update_assume_role_policy( - RoleName=role_name, - PolicyDocument=json.dumps(assume_role_policy_document) - ) - logger.info(f"Trust policy updated for role '{role_name}'.") - - except iam_client.exceptions.NoSuchEntityException: - logger.info(f"IAM role '{role_name}' not found. Creating...") - try: - create_role_response = iam_client.create_role( - RoleName=role_name, - AssumeRolePolicyDocument=json.dumps(assume_role_policy_document), - Description='IAM role for Bedrock Agent Lambda functions (Experiment)', - MaxSessionDuration=3600 - ) - role_arn = create_role_response['Role']['Arn'] - logger.info(f"Successfully created IAM role '{role_name}' with ARN: {role_arn}") - # Wait after role creation before attaching policies - logger.info("Waiting 15s for role creation propagation...") - time.sleep(15) - except ClientError as e: - logger.error(f"Error creating IAM role '{role_name}': {e}") - raise - - except ClientError as e: - logger.error(f"Error getting/updating IAM role '{role_name}': {e}") - raise - - # Attach basic execution policy (idempotent) - try: - logger.info(f"Attaching basic Lambda execution policy to role '{role_name}'...") - iam_client.attach_role_policy( - RoleName=role_name, - PolicyArn='arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole' - ) - logger.info("Attached basic Lambda execution policy.") - except ClientError as e: - logger.error(f"Error attaching basic Lambda execution policy: {e}") - # Don't necessarily raise, might already be attached or other issue - - # Add minimal inline policy for logging (can be expanded later if needed) - basic_inline_policy_name = "LambdaBasicLoggingPermissions" - basic_inline_policy_doc = { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Action": [ - "logs:CreateLogGroup", - "logs:CreateLogStream", - "logs:PutLogEvents" - ], - "Resource": f"arn:aws:logs:{AWS_REGION}:{aws_account_id}:log-group:/aws/lambda/*:*" # Scope down logs if possible - } - # Add S3 permissions here ONLY if Lambda code explicitly needs it - ] - } - - # Add Bedrock permissions policy - bedrock_policy_name = "BedrockAgentPermissions" - bedrock_policy_doc = { - "Version": "2012-10-17", - "Statement": [ - { - "Effect": "Allow", - "Action": [ - "bedrock:*" - ], - "Resource": "*" # You can scope this down to specific agents/models if needed - } - ] - } - try: - logger.info(f"Putting basic inline policy '{basic_inline_policy_name}' for role '{role_name}'...") - iam_client.put_role_policy( - RoleName=role_name, - PolicyName=basic_inline_policy_name, - PolicyDocument=json.dumps(basic_inline_policy_doc) - ) - logger.info(f"Successfully put inline policy '{basic_inline_policy_name}'.") - - # Add Bedrock permissions policy - logger.info(f"Putting Bedrock permissions policy '{bedrock_policy_name}' for role '{role_name}'...") - iam_client.put_role_policy( - RoleName=role_name, - PolicyName=bedrock_policy_name, - PolicyDocument=json.dumps(bedrock_policy_doc) - ) - logger.info(f"Successfully put inline policy '{bedrock_policy_name}'.") - - logger.info("Waiting 10s for policy changes to propagate...") - time.sleep(10) - except ClientError as e: - logger.error(f"Error putting inline policy: {e}") - # Decide if this is fatal - - if not role_arn: - raise Exception(f"Failed to create or retrieve ARN for role {role_name}") - - return role_arn -``` - -### 3.8 Lambda Deployment Functions - -This subsection groups together several helper functions dedicated to managing the deployment lifecycle of the AWS Lambda function that will serve as the tool executor for the Bedrock Agent. These functions handle packaging the Lambda code, managing its dependencies, deploying it to AWS, and cleaning up resources. - -#### 3.8.1 delete_lambda_function - -This function is designed to safely delete an AWS Lambda function. Before attempting to delete the function itself, it tries to remove any permissions associated with it (specifically, the permission allowing Bedrock to invoke it, using a predictable statement ID). It then checks if the function exists and, if so, proceeds with the deletion. The function includes a brief pause after initiating deletion, as the process is asynchronous. It returns `True` if deletion was attempted/occurred and `False` if the function didn't exist or if an error occurred during the process. - - -```python -def delete_lambda_function(lambda_client, function_name): - """Delete Lambda function if it exists, attempting to remove permissions first.""" - logger.info(f"Attempting to delete Lambda function: {function_name}...") - try: - # Use a predictable statement ID added by create_lambda_function - statement_id = f"AllowBedrockInvokeBasic-{function_name}" - try: - logger.info(f"Attempting to remove permission {statement_id} from {function_name}...") - lambda_client.remove_permission( - FunctionName=function_name, - StatementId=statement_id - ) - logger.info(f"Successfully removed permission {statement_id} from {function_name}.") - time.sleep(2) # Allow time for permission removal - except lambda_client.exceptions.ResourceNotFoundException: - logger.info(f"Permission {statement_id} not found on {function_name}. Skipping removal.") - except ClientError as perm_e: - # Log error but continue with deletion attempt - logger.warning(f"Error removing permission {statement_id} from {function_name}: {str(perm_e)}") - - # Check if function exists before attempting deletion - lambda_client.get_function(FunctionName=function_name) - logger.info(f"Function {function_name} exists. Deleting...") - lambda_client.delete_function(FunctionName=function_name) - - # Wait for deletion to complete using a waiter - logger.info(f"Waiting for {function_name} to be deleted...") - time.sleep(10) # Simple delay after delete call - logger.info(f"Function {function_name} deletion initiated.") - - return True # Indicates deletion was attempted/occurred - - except lambda_client.exceptions.ResourceNotFoundException: - logger.info(f"Lambda function '{function_name}' does not exist. No need to delete.") - return False # Indicates function didn't exist - except Exception as e: - logger.error(f"Error during deletion process for Lambda function '{function_name}': {str(e)}") - # Depending on severity, might want to raise or just return False - return False # Indicates an error occurred beyond not found -``` - -#### 3.8.2 upload_to_s3 - -This function handles uploading a Lambda deployment package (a .zip file) to Amazon S3. This is necessary when the package size exceeds the direct upload limit for Lambda. Key features include: -- **Bucket Management:** It generates a unique S3 bucket name (prefixed with `lambda-deployment-`) using the AWS account ID and a timestamp, or a fallback UUID if the account ID isn't available. It checks if this bucket exists and creates it if not, ensuring the correct region is specified for bucket creation. It also uses a waiter to ensure the bucket is available before proceeding. -- **S3 Key Generation:** Creates a unique S3 key (object path) for the uploaded file, incorporating the original filename and a UUID to prevent collisions. -- **Multipart Upload:** For large files (currently > 100MB in the code, though the Lambda direct upload limit is typically around 50MB for the zip, and 250MB unzipped including layers, so S3 is used for packages over ~45-50MB in this notebook), it uses `boto3.s3.transfer.S3Transfer` for robust multipart uploads. For smaller files, it uses a standard `put_object` call. -- **Retry Configuration:** Initializes the S3 and STS clients with a configuration that includes increased timeouts and retries for better resilience. -It returns a dictionary containing the `S3Bucket` and `S3Key` of the uploaded package, which is then used by the `create_lambda_function`. - - -```python -def upload_to_s3(zip_file, region, bucket_name=None): - """Upload zip file to S3 with retry logic and return S3 location.""" - logger.info(f"Preparing to upload {zip_file} to S3 in region {region}...") - # Configure the client with increased timeouts - config = Config( - connect_timeout=60, - read_timeout=300, - retries={'max_attempts': 3, 'mode': 'adaptive'} - ) - - s3_client = boto3.client('s3', region_name=region, config=config) - sts_client = boto3.client('sts', region_name=region, config=config) - - # Determine bucket name - if bucket_name is None: - try: - account_id = sts_client.get_caller_identity().get('Account') - timestamp = int(time.time()) - bucket_name = f"lambda-deployment-{account_id}-{timestamp}" - logger.info(f"Generated unique S3 bucket name: {bucket_name}") - except Exception as e: - fallback_id = uuid.uuid4().hex[:12] - bucket_name = f"lambda-deployment-{fallback_id}" - logger.warning(f"Error getting account ID ({e}). Using fallback bucket name: {bucket_name}") - - # Create bucket if needed - try: - s3_client.head_bucket(Bucket=bucket_name) - logger.info(f"Using existing S3 bucket: {bucket_name}") - except ClientError as e: - error_code = int(e.response['Error']['Code']) - if error_code == 404: - logger.info(f"Creating S3 bucket: {bucket_name}...") - try: - if region == 'us-east-1': - s3_client.create_bucket(Bucket=bucket_name) - else: - s3_client.create_bucket( - Bucket=bucket_name, - CreateBucketConfiguration={'LocationConstraint': region} - ) - logger.info(f"Created S3 bucket: {bucket_name}. Waiting for availability...") - waiter = s3_client.get_waiter('bucket_exists') - waiter.wait(Bucket=bucket_name, WaiterConfig={'Delay': 5, 'MaxAttempts': 12}) - logger.info(f"Bucket {bucket_name} is available.") - except Exception as create_e: - logger.error(f"Error creating bucket '{bucket_name}': {create_e}") - raise - else: - logger.error(f"Error checking bucket '{bucket_name}': {e}") - raise - - # Upload file - s3_key = f"lambda/{os.path.basename(zip_file)}-{uuid.uuid4().hex[:8]}" - try: - logger.info(f"Uploading {zip_file} to s3://{bucket_name}/{s3_key}...") - file_size = os.path.getsize(zip_file) - if file_size > 100 * 1024 * 1024: # Use multipart for files > 100MB - logger.info("Using multipart upload for large file...") - transfer_config = boto3.s3.transfer.TransferConfig( - multipart_threshold=10 * 1024 * 1024, max_concurrency=10, - multipart_chunksize=10 * 1024 * 1024, use_threads=True - ) - s3_transfer = boto3.s3.transfer.S3Transfer(client=s3_client, config=transfer_config) - s3_transfer.upload_file(zip_file, bucket_name, s3_key) - else: - with open(zip_file, 'rb') as f: - s3_client.put_object(Bucket=bucket_name, Key=s3_key, Body=f) - - logger.info(f"Successfully uploaded to s3://{bucket_name}/{s3_key}") - return {'S3Bucket': bucket_name, 'S3Key': s3_key} - - except Exception as upload_e: - logger.error(f"S3 upload failed: {upload_e}") - raise -``` - -#### 3.8.3 package_function - -This function automates the process of packaging the Lambda function code and its dependencies into a .zip file, ready for deployment. It relies on a `Makefile` located in the `source_dir` (which is `lambda_functions` in this notebook). The steps are: -1. **Path Setup:** Defines various paths for source files, the temporary packaging directory, the `Makefile`, and the final output .zip file. -2. **File Preparation:** It copies the specific Lambda handler script (e.g., `bedrock_agent_search_and_format.py`) to `lambda_function.py` within the `source_dir` because the `Makefile` is likely configured to look for a generic `lambda_function.py`. -3. **Execute Makefile:** It runs a `make clean package` command using `subprocess.check_call`. The `make` command is executed with the `source_dir` as its current working directory. The Makefile is responsible for creating a virtual environment, installing dependencies from `requirements.txt` into a temporary `package_dir`, and then zipping the contents of this directory along with `lambda_function.py` into `lambda_package.zip` within the `source_dir`. -4. **Output Handling:** After the `make` command successfully completes, it moves and renames the generated `lambda_package.zip` from the `source_dir` to the specified `build_dir` (the notebook's current directory in this case) with a name like `function_name.zip`. -5. **Cleanup:** In a `finally` block, it cleans up the temporary `lambda_function.py` copied earlier and any intermediate `lambda_package.zip` left in the `source_dir` (e.g., if the rename/move failed). -The function returns the path to the final .zip file. - - -```python -def package_function(function_name, source_dir, build_dir): - """Package Lambda function using Makefile found in source_dir.""" - # source_dir is where the .py, requirements.txt, Makefile live (e.g., lambda_functions) - # build_dir is where packaging happens and final zip ends up (e.g., lambda-experiments) - makefile_path = os.path.join(source_dir, 'Makefile') - # Temp build dir inside source_dir, as Makefile expects relative paths - temp_package_dir = os.path.join(source_dir, 'package_dir') - # Requirements file is in source_dir - source_req_path = os.path.join(source_dir, 'requirements.txt') - # Target requirements path inside source_dir (needed for Makefile) - # target_req_path = os.path.join(source_dir, 'requirements.txt') # No copy needed if running make in source_dir - source_func_script_path = os.path.join(source_dir, f'{function_name}.py') - # Target function script path inside source_dir, renamed for Makefile install_deps copy - target_func_script_path = os.path.join(source_dir, 'lambda_function.py') - # Make output zip is created inside source_dir - make_output_zip = os.path.join(source_dir, 'lambda_package.zip') - # Final zip path is in the build_dir (one level up from source_dir) - final_zip_path = os.path.join(build_dir, f'{function_name}.zip') - - logger.info(f"--- Packaging function {function_name} --- ") - logger.info(f"Source Dir (Makefile location & make cwd): {source_dir}") - logger.info(f"Build Dir (Final zip location): {build_dir}") - - if not os.path.exists(source_func_script_path): - raise FileNotFoundError(f"Source function script not found: {source_func_script_path}") - if not os.path.exists(source_req_path): - raise FileNotFoundError(f"Source requirements file not found: {source_req_path}") - if not os.path.exists(makefile_path): - raise FileNotFoundError(f"Makefile not found at: {makefile_path}") - - # Ensure no leftover target script from previous failed run - if os.path.exists(target_func_script_path): - logger.warning(f"Removing existing target script: {target_func_script_path}") - os.remove(target_func_script_path) - - try: - # 1. No need to create lambda subdir in build_dir - - # 2. Copy source function script to source_dir as lambda_function.py - logger.info(f"Copying {source_func_script_path} to {target_func_script_path}") - shutil.copy(source_func_script_path, target_func_script_path) - # Requirements file is already in source_dir, no copy needed. - - # 3. Run make command (execute from source_dir where Makefile is) - make_command = [ - 'make', - '-f', makefile_path, # Still specify Makefile path explicitly - 'clean', # Clean first - 'package', - # 'PYTHON_VERSION=python3.9' # Let Makefile use its default or system default - ] - logger.info(f"Running make command: {' '.join(make_command)} (in {source_dir})") - # Run make from source_dir; relative paths in Makefile should now work - subprocess.check_call(make_command, cwd=source_dir, stdout=subprocess.DEVNULL, stderr=subprocess.PIPE) - logger.info("Make command completed successfully.") - - # 4. Check for output zip in source_dir and rename/move to build_dir - if not os.path.exists(make_output_zip): - raise FileNotFoundError(f"Makefile did not produce expected output: {make_output_zip}") - - logger.info(f"Moving and renaming {make_output_zip} to {final_zip_path}") - if os.path.exists(final_zip_path): - logger.warning(f"Removing existing final zip: {final_zip_path}") - os.remove(final_zip_path) - # Use shutil.move for cross-filesystem safety if needed, os.rename is fine here - os.rename(make_output_zip, final_zip_path) - logger.info(f"Zip file ready: {final_zip_path}") - - return final_zip_path - - except subprocess.CalledProcessError as e: - logger.error(f"Error running Makefile for {function_name}: {e}") - stderr_output = "(No stderr captured)" - if e.stderr: - try: - stderr_output = e.stderr.decode() - except Exception: - stderr_output = "(Could not decode stderr)" - logger.error(f"Make stderr: {stderr_output}") - raise - except Exception as e: - logger.error(f"Error packaging function {function_name} using Makefile: {str(e)}") - logger.error(traceback.format_exc()) - raise - finally: - # 5. Clean up intermediate files in source_dir - if os.path.exists(target_func_script_path): - logger.info(f"Cleaning up temporary script: {target_func_script_path}") - os.remove(target_func_script_path) - if os.path.exists(make_output_zip): # If rename failed - logger.warning(f"Cleaning up intermediate zip in source dir: {make_output_zip}") - os.remove(make_output_zip) -``` - -#### 3.8.4 create_lambda_function - -This is a key function that handles the creation or update of the AWS Lambda function. It incorporates several important aspects for robustness and proper configuration: -- **Package Handling:** It checks the size of the deployment .zip file. If it's over a threshold (45MB in this code, as Lambda has limits for direct uploads), it calls `upload_to_s3` to upload the package to S3 and uses the S3 location for deployment. Otherwise, it reads the .zip file content directly for deployment. -- **Configuration:** Defines common arguments for Lambda creation/update, including the function name, runtime (`python3.9`), IAM role ARN, handler name, timeout, memory size, and crucial environment variables (Couchbase details, Bedrock model IDs) that the Lambda will need at runtime. -- **Idempotency & Retry Logic:** It first attempts to create the Lambda function. - - If it encounters a `ResourceConflictException` (meaning the function already exists), it then attempts to update the function's code and configuration. - - It includes a retry loop for both creation and update operations to handle potential throttling or other transient AWS issues, with an exponential backoff strategy. -- **Permissions:** After successfully creating or updating the Lambda, it adds a resource-based policy (permission) to the Lambda function. This permission specifically allows the Bedrock service (`bedrock.amazonaws.com`) to invoke this Lambda function. It uses a predictable `StatementId` and handles potential conflicts if the permission already exists. -- **Waiters:** It uses `boto3` waiters (`function_active_v2` after creation, `function_updated_v2` after update) to pause execution until the Lambda function becomes fully active and ready, preventing issues where subsequent operations might target a Lambda that isn't fully initialized. -The function returns the ARN of the successfully created or updated Lambda function. - - -```python -def create_lambda_function(lambda_client, function_name, handler, role_arn, zip_file, region): - """Create or update Lambda function with retry logic.""" - logger.info(f"Deploying Lambda function {function_name} from {zip_file}...") - - # Configure the client with increased timeouts for potentially long creation - config = Config( - connect_timeout=120, - read_timeout=300, - retries={'max_attempts': 5, 'mode': 'adaptive'} - ) - lambda_client_local = boto3.client('lambda', region_name=region, config=config) - - # Check zip file size - zip_size_mb = 0 - try: - zip_size_bytes = os.path.getsize(zip_file) - zip_size_mb = zip_size_bytes / (1024 * 1024) - logger.info(f"Zip file size: {zip_size_mb:.2f} MB") - except OSError as e: - logger.error(f"Could not get size of zip file {zip_file}: {e}") - raise # Cannot proceed without zip file - - use_s3 = zip_size_mb > 45 # Use S3 for packages over ~45MB - s3_location = None - zip_content = None - - if use_s3: - logger.info(f"Package size ({zip_size_mb:.2f} MB) requires S3 deployment.") - s3_location = upload_to_s3(zip_file, region) - if not s3_location: - raise Exception("Failed to upload Lambda package to S3.") - else: - logger.info("Deploying package directly.") - try: - with open(zip_file, 'rb') as f: - zip_content = f.read() - except OSError as e: - logger.error(f"Could not read zip file {zip_file}: {e}") - raise - - # Define common create/update args - common_args = { - 'FunctionName': function_name, - 'Runtime': 'python3.9', - 'Role': role_arn, - 'Handler': handler, - 'Timeout': 180, - 'MemorySize': 1536, # Adjust as needed - # Env vars loaded from main script env or .env - 'Environment': { - 'Variables': { - 'CB_HOST': os.getenv('CB_HOST', 'couchbase://localhost'), - 'CB_USERNAME': os.getenv('CB_USERNAME', 'Administrator'), - 'CB_PASSWORD': os.getenv('CB_PASSWORD', 'password'), - 'CB_BUCKET_NAME': os.getenv('CB_BUCKET_NAME', 'vector-search-exp'), - 'SCOPE_NAME': os.getenv('SCOPE_NAME', 'bedrock_exp'), - 'COLLECTION_NAME': os.getenv('COLLECTION_NAME', 'docs_exp'), - 'INDEX_NAME': os.getenv('INDEX_NAME', 'vector_search_bedrock_exp'), - 'EMBEDDING_MODEL_ID': os.getenv('EMBEDDING_MODEL_ID', EMBEDDING_MODEL_ID), - 'AGENT_MODEL_ID': os.getenv('AGENT_MODEL_ID', AGENT_MODEL_ID) - } - } - } - - if use_s3: - code_arg = {'S3Bucket': s3_location['S3Bucket'], 'S3Key': s3_location['S3Key']} - else: - code_arg = {'ZipFile': zip_content} - - max_retries = 3 - base_delay = 10 - for attempt in range(1, max_retries + 1): - try: - logger.info(f"Creating function '{function_name}' (attempt {attempt}/{max_retries})...") - create_args = common_args.copy() - create_args['Code'] = code_arg - create_args['Publish'] = True # Publish a version - - create_response = lambda_client_local.create_function(**create_args) - function_arn = create_response['FunctionArn'] - logger.info(f"Successfully created function '{function_name}' with ARN: {function_arn}") - - # Add basic invoke permission after creation - time.sleep(5) # Give function time to be fully created before adding policy - statement_id = f"AllowBedrockInvokeBasic-{function_name}" - try: - logger.info(f"Adding basic invoke permission ({statement_id}) to {function_name}...") - lambda_client_local.add_permission( - FunctionName=function_name, - StatementId=statement_id, - Action='lambda:InvokeFunction', - Principal='bedrock.amazonaws.com' - ) - logger.info(f"Successfully added basic invoke permission {statement_id}.") - except lambda_client_local.exceptions.ResourceConflictException: - logger.info(f"Permission {statement_id} already exists for {function_name}. Skipping add.") - except ClientError as perm_e: - logger.warning(f"Failed to add basic invoke permission {statement_id} to {function_name}: {perm_e}") - - # Wait for function to be Active - logger.info(f"Waiting for function '{function_name}' to become active...") - waiter = lambda_client_local.get_waiter('function_active_v2') - waiter.wait(FunctionName=function_name, WaiterConfig={'Delay': 5, 'MaxAttempts': 24}) - logger.info(f"Function '{function_name}' is active.") - - return function_arn # Return ARN upon successful creation - - except lambda_client_local.exceptions.ResourceConflictException: - logger.warning(f"Function '{function_name}' already exists. Attempting to update code...") - try: - if use_s3: - update_response = lambda_client_local.update_function_code( - FunctionName=function_name, - S3Bucket=s3_location['S3Bucket'], - S3Key=s3_location['S3Key'], - Publish=True - ) - else: - update_response = lambda_client_local.update_function_code( - FunctionName=function_name, - ZipFile=zip_content, - Publish=True - ) - function_arn = update_response['FunctionArn'] - logger.info(f"Successfully updated function code for '{function_name}'. New version ARN: {function_arn}") - - # Also update configuration just in case - try: - logger.info(f"Updating configuration for '{function_name}'...") - lambda_client_local.update_function_configuration(**common_args) - logger.info(f"Configuration updated for '{function_name}'.") - except ClientError as conf_e: - logger.warning(f"Could not update configuration for '{function_name}': {conf_e}") - - # Re-verify invoke permission after update - time.sleep(5) - statement_id = f"AllowBedrockInvokeBasic-{function_name}" - try: - logger.info(f"Verifying/Adding basic invoke permission ({statement_id}) after update...") - lambda_client_local.add_permission( - FunctionName=function_name, - StatementId=statement_id, - Action='lambda:InvokeFunction', - Principal='bedrock.amazonaws.com' - ) - logger.info(f"Successfully added/verified basic invoke permission {statement_id}.") - except lambda_client_local.exceptions.ResourceConflictException: - logger.info(f"Permission {statement_id} already exists for {function_name}. Skipping add.") - except ClientError as perm_e: - logger.warning(f"Failed to add/verify basic invoke permission {statement_id} after update: {perm_e}") - - # Wait for function to be Active after update - logger.info(f"Waiting for function '{function_name}' update to complete...") - waiter = lambda_client_local.get_waiter('function_updated_v2') - waiter.wait(FunctionName=function_name, WaiterConfig={'Delay': 5, 'MaxAttempts': 24}) - logger.info(f"Function '{function_name}' update complete.") - - return function_arn # Return ARN after successful update - - except ClientError as update_e: - logger.error(f"Failed to update function '{function_name}': {update_e}") - if attempt < max_retries: - delay = base_delay * (2 ** (attempt - 1)) - logger.info(f"Retrying update in {delay} seconds...") - time.sleep(delay) - else: - logger.error("Maximum update retries reached. Deployment failed.") - raise update_e - - except ClientError as e: - # Handle throttling or other retryable errors - error_code = e.response.get('Error', {}).get('Code') - if error_code in ['ThrottlingException', 'ProvisionedConcurrencyConfigNotFoundException', 'EC2ThrottledException'] or 'Rate exceeded' in str(e): - logger.warning(f"Retryable error on attempt {attempt}: {e}") - if attempt < max_retries: - delay = base_delay * (2 ** (attempt - 1)) + (uuid.uuid4().int % 5) - logger.info(f"Retrying in {delay} seconds...") - time.sleep(delay) - else: - logger.error("Maximum retries reached after retryable error. Deployment failed.") - raise e - else: - logger.error(f"Error creating/updating Lambda '{function_name}': {e}") - logger.error(traceback.format_exc()) # Log full traceback for unexpected errors - raise e # Re-raise non-retryable or unexpected errors - except Exception as e: - logger.error(f"Unexpected error during Lambda deployment: {e}") - logger.error(traceback.format_exc()) - raise e - - # If loop completes without returning, something went wrong - raise Exception(f"Failed to deploy Lambda function {function_name} after {max_retries} attempts.") -``` - -### 3.9 Agent Resource Deletion Functions - -This subsection provides helper functions to manage the cleanup of AWS Bedrock Agent resources. Creating agents, action groups, and aliases results in persistent configurations in AWS. These functions are essential for maintaining a clean environment, especially during experimentation and development, by allowing for the removal of these resources when they are no longer needed or before recreating them in a subsequent run. - -#### 3.9.1 get_agent_by_name - -This utility function searches for an existing Bedrock Agent by its name. Since the AWS SDK's `get_agent` requires an `agentId`, and you often work with human-readable names, this function bridges that gap. It uses the `list_agents` operation (with a paginator to handle potentially many agents in an account) and iterates through the summaries, comparing the `agentName` field. If a match is found, it returns the corresponding `agentId`. If no agent with the given name is found or an error occurs during listing, it returns `None`. - - -```python -def get_agent_by_name(agent_client, agent_name): - """Find an agent ID by its name using list_agents.""" - logger.info(f"Attempting to find agent by name: {agent_name}") - try: - paginator = agent_client.get_paginator('list_agents') - for page in paginator.paginate(): - for agent_summary in page.get('agentSummaries', []): - if agent_summary.get('agentName') == agent_name: - agent_id = agent_summary.get('agentId') - logger.info(f"Found agent '{agent_name}' with ID: {agent_id}") - return agent_id - logger.info(f"Agent '{agent_name}' not found.") - return None - except ClientError as e: - logger.error(f"Error listing agents to find '{agent_name}': {e}") - return None # Treat as not found if error occurs -``` - -#### 3.9.2 delete_action_group - -This function handles the deletion of a specific action group associated with a Bedrock Agent. Action groups are always tied to the `DRAFT` version of an agent. It calls `delete_agent_action_group`, providing the `agentId`, `agentVersion='DRAFT'`, and the `actionGroupId`. It uses `skipResourceInUseCheck=True` to force deletion, which can be useful if the agent is in a state (like `PREPARING`) that might otherwise prevent immediate deletion. The function includes error handling for cases where the action group is not found or if a conflict occurs (e.g., agent is busy), attempting a retry after a delay in case of a conflict. It returns `True` if deletion was successful or the group was not found, and `False` if an unrecoverable error occurred. - - -```python -def delete_action_group(agent_client, agent_id, action_group_id): - """Deletes a specific action group for an agent.""" - logger.info(f"Attempting to delete action group {action_group_id} for agent {agent_id}...") - try: - agent_client.delete_agent_action_group( - agentId=agent_id, - agentVersion='DRAFT', # Action groups are tied to the DRAFT version - actionGroupId=action_group_id, - skipResourceInUseCheck=True # Force deletion even if in use (e.g., during prepare) - ) - logger.info(f"Successfully deleted action group {action_group_id} for agent {agent_id}.") - time.sleep(5) # Short pause after deletion - return True - except agent_client.exceptions.ResourceNotFoundException: - logger.info(f"Action group {action_group_id} not found for agent {agent_id}. Skipping deletion.") - return False - except ClientError as e: - # Handle potential throttling or conflict if prepare is happening - error_code = e.response.get('Error', {}).get('Code') - if error_code == 'ConflictException': - logger.warning(f"Conflict deleting action group {action_group_id} (agent might be preparing/busy). Retrying once after delay...") - time.sleep(15) - try: - agent_client.delete_agent_action_group( - agentId=agent_id, agentVersion='DRAFT', actionGroupId=action_group_id, skipResourceInUseCheck=True - ) - logger.info(f"Successfully deleted action group {action_group_id} after retry.") - return True - except Exception as retry_e: - logger.error(f"Error deleting action group {action_group_id} on retry: {retry_e}") - return False - else: - logger.error(f"Error deleting action group {action_group_id} for agent {agent_id}: {e}") - return False -``` - -#### 3.9.3 delete_agent_and_resources - -This function orchestrates the complete cleanup of a Bedrock Agent and its associated components. Its process is: -1. **Find Agent:** It first calls `get_agent_by_name` to retrieve the `agentId` for the specified `agent_name`. If the agent isn't found, it exits gracefully. -2. **Delete Action Groups:** It lists all action groups associated with the `DRAFT` version of the agent. For each action group found, it calls `delete_action_group` to remove it. -3. **Delete Agent:** After attempting to delete all action groups, it proceeds to delete the agent itself using `delete_agent` with `skipResourceInUseCheck=True` to force the deletion. -4. **Wait for Deletion:** It includes a custom polling loop to wait for the agent to be fully deleted by repeatedly calling `get_agent` and checking for a `ResourceNotFoundException`. This ensures that subsequent operations (like recreating an agent with the same name) are less likely to encounter conflicts. - - -```python -def delete_agent_and_resources(agent_client, agent_name): - """Deletes the agent and its associated action groups.""" - agent_id = get_agent_by_name(agent_client, agent_name) - if not agent_id: - logger.info(f"Agent '{agent_name}' not found, no deletion needed.") - return - - logger.warning(f"--- Deleting Agent Resources for '{agent_name}' (ID: {agent_id}) ---") - - # 1. Delete Action Groups - try: - logger.info(f"Listing action groups for agent {agent_id}...") - action_groups = agent_client.list_agent_action_groups( - agentId=agent_id, - agentVersion='DRAFT' # List groups for the DRAFT version - ).get('actionGroupSummaries', []) - - if action_groups: - logger.info(f"Found {len(action_groups)} action groups to delete.") - for ag in action_groups: - delete_action_group(agent_client, agent_id, ag['actionGroupId']) - else: - logger.info("No action groups found to delete.") - - except ClientError as e: - logger.error(f"Error listing action groups for agent {agent_id}: {e}") - # Continue to agent deletion attempt even if listing fails - - # 2. Delete the Agent - try: - logger.info(f"Attempting to delete agent {agent_id} ('{agent_name}')...") - agent_client.delete_agent(agentId=agent_id, skipResourceInUseCheck=True) # Force delete - - # Wait for agent deletion (custom waiter logic might be needed if no standard waiter) - logger.info(f"Waiting up to 2 minutes for agent {agent_id} deletion...") - deleted = False - for _ in range(24): # Check every 5 seconds for 2 minutes - try: - agent_client.get_agent(agentId=agent_id) - time.sleep(5) - except agent_client.exceptions.ResourceNotFoundException: - logger.info(f"Agent {agent_id} successfully deleted.") - deleted = True - break - except ClientError as e: - # Handle potential throttling during check - error_code = e.response.get('Error', {}).get('Code') - if error_code == 'ThrottlingException': - logger.warning("Throttled while checking agent deletion status, continuing wait...") - time.sleep(10) - else: - logger.error(f"Error checking agent deletion status: {e}") - # Break checking loop on unexpected error - break - if not deleted: - logger.warning(f"Agent {agent_id} deletion confirmation timed out.") - - except agent_client.exceptions.ResourceNotFoundException: - logger.info(f"Agent {agent_id} ('{agent_name}') already deleted or not found.") - except ClientError as e: - logger.error(f"Error deleting agent {agent_id}: {e}") - - logger.info(f"--- Agent Resource Deletion Complete for '{agent_name}' ---") -``` - -### 3.10 Agent Creation Functions - -This subsection contains functions dedicated to the setup and configuration of the Bedrock Agent itself, including its core definition, action groups that link it to tools (Lambda functions), and the preparation process that makes it ready for invocation. - -#### 3.10.1 create_agent - -This function creates a new Bedrock Agent. It takes the desired `agent_name`, the `agent_role_arn` (obtained from `create_agent_role`), and the `foundation_model_id` (e.g., for Claude Sonnet) as input. Key configurations include: -- **Instruction:** A detailed prompt that defines the agent's persona, capabilities, and how it should use its tools. The instruction in this notebook guides the agent to use a single "SearchAndFormat" tool and present results directly. -- **`idleSessionTTLInSeconds`:** Sets a timeout for how long an agent session can remain idle. -- **Description:** A brief description for the agent. -After calling `create_agent`, the function logs the initial response details (ID, ARN, status). It then enters a polling loop to wait until the agent's status moves out of the `CREATING` state, typically to `NOT_PREPARED`. If the agent creation fails and enters a `FAILED` state, it raises an exception. It returns the `agent_id` and `agent_arn` upon successful initiation of creation. - - -```python -def create_agent(agent_client, agent_name, agent_role_arn, foundation_model_id): - """Creates a new Bedrock Agent.""" - logger.info(f"--- Creating Agent: {agent_name} ---") - try: - # Updated Instruction for single tool - instruction = ( - "You are a helpful research assistant. Your primary function is to use the SearchAndFormat tool " - "to find relevant documents based on user queries and format them. " - "Use the user's query for the search, and specify a formatting style if requested, otherwise use the default. " - "Present the formatted results returned by the tool directly to the user." - "Only use the tool provided. Do not add your own knowledge." - ) - - response = agent_client.create_agent( - agentName=agent_name, - agentResourceRoleArn=agent_role_arn, - foundationModel=foundation_model_id, - instruction=instruction, - idleSessionTTLInSeconds=1800, # 30 minutes - description=f"Experimental agent for Couchbase search and content formatting ({foundation_model_id})" - # promptOverrideConfiguration={} # Optional: Add later if needed - ) - agent_info = response.get('agent') - agent_id = agent_info.get('agentId') - agent_arn = agent_info.get('agentArn') - agent_status = agent_info.get('agentStatus') - logger.info(f"Agent creation initiated. Name: {agent_name}, ID: {agent_id}, ARN: {agent_arn}, Status: {agent_status}") - - # Wait for agent to become NOT_PREPARED (initial state after creation) - # Using custom waiter logic as there might not be a standard one for this transition - logger.info(f"Waiting for agent {agent_id} to reach initial state...") - for _ in range(12): # Check for up to 1 minute - current_status = agent_client.get_agent(agentId=agent_id)['agent']['agentStatus'] - logger.info(f"Agent {agent_id} status: {current_status}") - if current_status != 'CREATING': # Expect NOT_PREPARED or FAILED - break - time.sleep(5) - - final_status = agent_client.get_agent(agentId=agent_id)['agent']['agentStatus'] - if final_status == 'FAILED': - logger.error(f"Agent {agent_id} creation failed.") - # Optionally retrieve failure reasons if API provides them - raise Exception(f"Agent creation failed for {agent_name}") - else: - logger.info(f"Agent {agent_id} successfully created (Status: {final_status}).") - - return agent_id, agent_arn - - except ClientError as e: - logger.error(f"Error creating agent '{agent_name}': {e}") - raise -``` - -#### 3.10.2 create_action_group - -This function creates or updates an action group for the specified agent. Action groups define the tools an agent can use. In this Lambda-based approach, the action group links the agent to the Lambda function that implements the tool. Key steps include: -- **Function Schema Definition:** It programmatically defines a `function_schema_details` dictionary. This schema describes the tool (`searchAndFormatDocuments`) that the Lambda function provides, including its name, description, and expected input parameters (`query`, `k`, `style`) with their types and whether they are required. This schema is what the agent uses to understand how to invoke the tool. -- **Idempotency:** It first checks if an action group with the given `action_group_name` already exists for the `DRAFT` version of the agent. - - If it exists, it attempts to update the existing action group using `update_agent_action_group`, ensuring the `actionGroupExecutor` points to the correct Lambda ARN and that it uses the `functionSchema` (for defining the tool via its signature) rather than an OpenAPI schema. - - If it doesn't exist, it creates a new action group using `create_agent_action_group`. -- **`actionGroupExecutor`:** This is set to `{'lambda': function_arn}`, where `function_arn` is the ARN of the deployed Lambda function. This tells Bedrock to invoke this Lambda when the agent decides to use a tool from this action group. -- **`functionSchema` Parameter:** The `functionSchema` (containing the `function_schema_details`) is provided to the `create_agent_action_group` or `update_agent_action_group` call. This method of defining tools is simpler for single functions compared to providing a full OpenAPI schema, which is also an option for more complex APIs. -- **State:** The action group is explicitly set to `ENABLED`. -A brief pause is added after creation/update to allow changes to propagate. The function returns the `actionGroupId`. - - -```python -def create_action_group(agent_client, agent_id, action_group_name, function_arn, schema_path=None): - """Creates an action group for the agent using Define with function details.""" - logger.info(f"--- Creating/Updating Action Group (Function Details): {action_group_name} for Agent: {agent_id} ---") - logger.info(f"Lambda ARN: {function_arn}") - - # Define function schema details (for functionSchema parameter) - function_schema_details = { - 'functions': [ - { - 'name': 'searchAndFormatDocuments', # Function name agent will call - 'description': 'Performs vector search based on query, retrieves documents, and formats results using specified style.', - 'parameters': { - 'query': { - 'description': 'The search query text.', - 'type': 'string', - 'required': True - }, - 'k': { - 'description': 'The maximum number of documents to retrieve.', - 'type': 'integer', - 'required': False # Making optional as Lambda has default - }, - 'style': { - 'description': 'The desired formatting style for the results (e.g., \'bullet points\', \'paragraph\', \'summary\').', - 'type': 'string', - 'required': False # Making optional as Lambda has default - } - } - } - ] - } - - try: - # Check if Action Group already exists for the DRAFT version - try: - logger.info(f"Checking if action group '{action_group_name}' already exists for agent {agent_id} DRAFT version...") - paginator = agent_client.get_paginator('list_agent_action_groups') - existing_group = None - for page in paginator.paginate(agentId=agent_id, agentVersion='DRAFT'): - for ag_summary in page.get('actionGroupSummaries', []): - if ag_summary.get('actionGroupName') == action_group_name: - existing_group = ag_summary - break - if existing_group: - break - - if existing_group: - ag_id = existing_group['actionGroupId'] - logger.warning(f"Action Group '{action_group_name}' (ID: {ag_id}) already exists for agent {agent_id} DRAFT. Attempting update to Function Details.") - # Update existing action group - REMOVE apiSchema, ADD functionSchema - response = agent_client.update_agent_action_group( - agentId=agent_id, - agentVersion='DRAFT', - actionGroupId=ag_id, - actionGroupName=action_group_name, - actionGroupExecutor={'lambda': function_arn}, - functionSchema={ # Use functionSchema - 'functions': function_schema_details['functions'] # Pass the list with the correct key - }, - actionGroupState='ENABLED' - ) - ag_info = response.get('agentActionGroup') - logger.info(f"Successfully updated Action Group '{action_group_name}' (ID: {ag_info.get('actionGroupId')}) to use Function Details.") - return ag_info.get('actionGroupId') - else: - logger.info(f"Action group '{action_group_name}' does not exist. Creating new with Function Details.") - - except ClientError as e: - logger.error(f"Error checking for existing action group '{action_group_name}': {e}. Proceeding with creation attempt.") - - - # Create new action group if not found or update failed implicitly - response = agent_client.create_agent_action_group( - agentId=agent_id, - agentVersion='DRAFT', - actionGroupName=action_group_name, - actionGroupExecutor={ - 'lambda': function_arn - }, - functionSchema={ # Use functionSchema - 'functions': function_schema_details['functions'] # Pass the list with the correct key - }, - actionGroupState='ENABLED' - ) - ag_info = response.get('agentActionGroup') - ag_id = ag_info.get('actionGroupId') - logger.info(f"Successfully created Action Group '{action_group_name}' with ID: {ag_id} using Function Details.") - time.sleep(5) # Pause after creation/update - return ag_id - - except ClientError as e: - logger.error(f"Error creating/updating action group '{action_group_name}' using Function Details: {e}") - raise -``` - -#### 3.10.3 prepare_agent - -This function initiates the preparation of the `DRAFT` version of the Bedrock Agent and waits for this process to complete. Preparation involves Bedrock compiling the agent's configuration (instructions, action groups, model settings) and making it ready for invocation. -- It calls `bedrock_agent_client.prepare_agent(agentId=agent_id)`. -- **Custom Waiter:** It then uses a custom-defined `boto3` waiter (`AgentPrepared`) to poll the agent's status. The waiter configuration specifies: - - `delay`: How often to check (e.g., every 30 seconds). - - `operation`: The SDK call to make for checking (`GetAgent`). - - `maxAttempts`: How many times to check before timing out (e.g., 20 attempts, for a total of up to 10 minutes). - - `acceptors`: Conditions that determine success, failure, or retry. It succeeds if `agent.agentStatus` becomes `PREPARED`, fails if it becomes `FAILED`, and retries if it's `UPDATING` (though `PREPARING` is the more typical intermediate state here). -If the waiter times out or the agent preparation results in a `FAILED` status, an exception is raised. This step is crucial because an agent cannot be invoked (or an alias reliably pointed to its version) until it is successfully prepared. - - -```python -def prepare_agent(agent_client, agent_id): - """Prepares the DRAFT version of the agent.""" - logger.info(f"--- Preparing Agent: {agent_id} ---") - try: - response = agent_client.prepare_agent(agentId=agent_id) - agent_version = response.get('agentVersion') # Should be DRAFT - prepared_at = response.get('preparedAt') - status = response.get('agentStatus') # Should be PREPARING - logger.info(f"Agent preparation initiated for version '{agent_version}'. Status: {status}. Prepared At: {prepared_at}") - - # Wait for preparation to complete (PREPARED or FAILED) - logger.info(f"Waiting for agent {agent_id} preparation to complete (up to 10 minutes)...") - # Define a simple waiter config - waiter_config = { - 'version': 2, - 'waiters': { - 'AgentPrepared': { - 'delay': 30, # Check every 30 seconds - 'operation': 'GetAgent', - 'maxAttempts': 20, # Max 10 minutes - 'acceptors': [ - { - 'matcher': 'path', - 'expected': 'PREPARED', - 'argument': 'agent.agentStatus', - 'state': 'success' - }, - { - 'matcher': 'path', - 'expected': 'FAILED', - 'argument': 'agent.agentStatus', - 'state': 'failure' - }, - { - 'matcher': 'path', - 'expected': 'UPDATING', # Can happen during prep? Treat as retryable - 'argument': 'agent.agentStatus', - 'state': 'retry' - } - ] - } - } - } - waiter_model = WaiterModel(waiter_config) - custom_waiter = create_waiter_with_client('AgentPrepared', waiter_model, agent_client) - - try: # Outer try for both preparation and alias handling - custom_waiter.wait(agentId=agent_id) - logger.info(f"Agent {agent_id} successfully prepared.") - - except Exception as e: # Outer except catches prepare_agent wait errors OR unhandled alias errors - logger.error(f"Agent {agent_id} preparation failed or timed out (or alias error): {e}") - # Check final status if possible - try: - final_status = agent_client.get_agent(agentId=agent_id)['agent']['agentStatus'] - logger.error(f"Final agent status: {final_status}") - except Exception as get_e: - logger.error(f"Could not retrieve final agent status after wait failure: {get_e}") - raise Exception(f"Agent preparation or alias setup failed for {agent_id}") - - except Exception as e: - logger.error(f"Error preparing agent {agent_id}: {e}") - # Handle error, maybe exit - raise e # Re-raise the exception -``` - -### 3.11 Agent Invocation Function - -This subsection provides the function used to interact with the prepared and aliased Bedrock Agent, sending it a prompt and processing its response. - -#### 3.11.1 test_agent_invocation - -This function is responsible for invoking the configured Bedrock Agent and handling its response. Key operations include: -- **Invocation:** Calls `bedrock_agent_runtime_client.invoke_agent` with the `agentId`, `agentAliasId`, a unique `sessionId` (generated for each invocation in this script), the user's `prompt` (inputText), and `enableTrace=True` to get detailed trace information for debugging. -- **Stream Processing:** The agent's response is a stream. The function iterates through the events in this stream (`response.get('completion', [])`). - - **`chunk` events:** These contain parts of the agent's textual response. The function decodes these byte chunks (UTF-8) and concatenates them to form the `completion_text`. - - **`trace` events:** If `enableTrace` was true, these events provide detailed insight into the agent's internal operations, such as which foundation model was called, the input to the model, any tool invocations (though in the Lambda approach, the tool invocation itself is handled by Bedrock calling Lambda, the trace might show the agent deciding to call it and the result from it), and rationale. The function collects these trace parts. -- **Logging:** It logs the final combined `completion_text` and a summary of the trace events, which can be very helpful for understanding the agent's decision-making process and debugging any issues with tool invocation or response generation. -It returns the final textual response from the agent. - - -```python -def test_agent_invocation(agent_runtime_client, agent_id, agent_alias_id, session_id, prompt): - """Invokes the agent and prints the response.""" - logger.info(f"--- Testing Agent Invocation (Agent ID: {agent_id}, Alias: {agent_alias_id}) ---") - logger.info(f"Session ID: {session_id}") - logger.info(f"Prompt: \"{prompt}\"") - - try: - response = agent_runtime_client.invoke_agent( - agentId=agent_id, - agentAliasId=agent_alias_id, - sessionId=session_id, - inputText=prompt, - enableTrace=True # Enable trace for debugging - ) - - logger.info("Agent invocation successful. Processing response...") - completion_text = "" - trace_events = [] - - # The response is a stream. Iterate through the chunks. - for event in response.get('completion', []): - if 'chunk' in event: - data = event['chunk'].get('bytes', b'') - decoded_chunk = data.decode('utf-8') - completion_text += decoded_chunk - elif 'trace' in event: - trace_part = event['trace'].get('trace') - if trace_part: - trace_events.append(trace_part) - else: - logger.warning(f"Unhandled event type in stream: {event}") - - # Log final combined response - logger.info(f"--- Agent Final Response ---{completion_text}") - - # Keep trace summary log (optional, can be removed if too verbose) - if trace_events: - logger.info("--- Invocation Trace Summary ---") - for i, trace in enumerate(trace_events): - trace_type = trace.get('type') - step_type = trace.get('orchestration', {}).get('stepType') - model_invocation_input = trace.get('modelInvocationInput') - if model_invocation_input: - fm_input = model_invocation_input.get('text', - json.dumps(model_invocation_input.get('invocationInput',{}).get('toolConfiguration',{})) # Handle tool input - ) - log_line = f"Trace {i+1}: Type={trace_type}, Step={step_type}" - rationale = trace.get('rationale', {}).get('text') - if rationale: log_line += f", Rationale=\"{rationale[:100]}...\"" - logger.info(log_line) # Log summary line - - return completion_text - - except ClientError as e: - logger.error(f"Error invoking agent: {e}") - logger.error(traceback.format_exc()) - return None - except Exception as e: - logger.error(f"Unexpected error during agent invocation: {e}") - logger.error(traceback.format_exc()) - return None -``` - -## 4. Main Execution Flow - -This is the primary section of the notebook where all the previously defined helper functions are called in sequence to set up the complete Bedrock Agent environment with a Lambda-backed tool, and then test its invocation. The flow is designed to be largely idempotent where possible, meaning it can often be re-run, and it will attempt to clean up or reuse existing resources before creating new ones (e.g., IAM roles, Lambda functions, agents). The major steps are outlined below: - -### 4.1 Initial Setup - -This first step in the main execution flow performs essential preliminary tasks: -1. Logs a starting message for the script execution. -2. Calls `check_environment_variables()` to ensure all required environment variables (AWS credentials, Couchbase password, etc.) are set. If not, it raises an `EnvironmentError` to halt execution, as the subsequent steps depend on these variables. -3. Calls `initialize_aws_clients()` to get the necessary `boto3` client objects for Bedrock, IAM, Lambda, etc. -4. Calls `connect_couchbase()` to establish a connection to the Couchbase cluster. -If any of these critical initialization steps fail, an exception is raised to stop the notebook's execution, preventing errors in later stages. - - -```python -logger.info("--- Starting Bedrock Agent Experiment Script ---") - -if not check_environment_variables(): - # In a notebook, raising an exception might be better than exit(1) - raise EnvironmentError("Missing required environment variables. Check logs.") - -# Initialize all clients, including the agent client -try: - bedrock_runtime_client, iam_client, lambda_client, bedrock_agent_client, bedrock_agent_runtime_client = initialize_aws_clients() - cb_cluster = connect_couchbase() - logger.info("AWS clients and Couchbase connection initialized.") -except Exception as e: - logger.error(f"Initialization failed: {e}") - raise # Re-raise the exception to stop execution -``` - - 2025-06-09 13:39:41,643 - INFO - --- Starting Bedrock Agent Experiment Script --- - 2025-06-09 13:39:41,644 - INFO - All required environment variables are set. - 2025-06-09 13:39:41,644 - INFO - Initializing AWS clients in region: us-east-1 - 2025-06-09 13:39:42,002 - INFO - AWS clients initialized successfully. - 2025-06-09 13:39:42,002 - INFO - Connecting to Couchbase cluster at couchbases://cb.hlcup4o4jmjr55yf.cloud.couchbase.com... - 2025-06-09 13:39:44,131 - INFO - Successfully connected to Couchbase. - 2025-06-09 13:39:44,132 - INFO - AWS clients and Couchbase connection initialized. - - -### 4.2 Couchbase Setup - -This block focuses on preparing the Couchbase environment to serve as the vector store for the agent. It involves: -1. Calling `setup_collection()`: This helper function ensures that the target Couchbase bucket, scope, and collection (defined by `CB_BUCKET_NAME`, `SCOPE_NAME`, `COLLECTION_NAME`) are created if they don't already exist. It also ensures a primary index is present on the collection. -2. Calling `setup_search_index()`: This creates or updates the Couchbase Full-Text Search (FTS) index (named by `INDEX_NAME`) using the definition from `INDEX_JSON_PATH`. This search index is crucial for performing vector similarity searches. -3. Calling `clear_collection()`: This function deletes all existing documents from the target collection. This step ensures that each run of the notebook starts with a clean slate, preventing data from previous experiments from interfering with the current one. -If any part of this Couchbase setup fails, an exception is logged and re-raised to stop further execution. - - -```python -try: - # Use the setup functions with the script's config variables - cb_collection = setup_collection(cb_cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) - logger.info(f"Couchbase collection '{CB_BUCKET_NAME}.{SCOPE_NAME}.{COLLECTION_NAME}' setup complete.") - - # Pass required args to setup_search_index - setup_search_index(cb_cluster, INDEX_NAME, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME, INDEX_JSON_PATH) - logger.info(f"Couchbase search index '{INDEX_NAME}' setup complete.") - - # Clear any existing documents from previous runs - clear_collection(cb_cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) - logger.info("Cleared any existing documents from the collection.") -except Exception as e: - logger.error(f"Couchbase setup failed: {e}") - raise -``` - - 2025-06-09 13:39:44,140 - INFO - Setting up collection: vector-search-testing/shared/bedrock - 2025-06-09 13:39:45,245 - INFO - Bucket 'vector-search-testing' exists. - 2025-06-09 13:39:46,219 - INFO - Scope 'shared' already exists. - 2025-06-09 13:39:47,149 - INFO - Collection 'bedrock' already exists. - 2025-06-09 13:39:47,152 - INFO - Ensuring primary index exists on `vector-search-testing`.`shared`.`bedrock`... - 2025-06-09 13:39:48,185 - INFO - Primary index present or created successfully. - 2025-06-09 13:39:48,186 - INFO - Collection setup complete. - 2025-06-09 13:39:48,187 - INFO - Couchbase collection 'vector-search-testing.shared.bedrock' setup complete. - 2025-06-09 13:39:48,187 - INFO - Looking for index definition at: /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/aws_index.json - 2025-06-09 13:39:48,192 - INFO - Loaded index definition from /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/aws_index.json, ensuring name is 'vector_search_bedrock' and source is 'vector-search-testing'. - 2025-06-09 13:39:48,193 - INFO - Upserting search index 'vector_search_bedrock'... - 2025-06-09 13:39:48,880 - WARNING - Search index 'vector_search_bedrock' likely already existed (caught QueryIndexAlreadyExistsException, check if applicable). Upsert attempted. - 2025-06-09 13:39:48,881 - INFO - Couchbase search index 'vector_search_bedrock' setup complete. - 2025-06-09 13:39:48,881 - WARNING - Attempting to clear all documents from `vector-search-testing`.`shared`.`bedrock`... - 2025-06-09 13:39:49,141 - WARNING - Could not retrieve mutation count after delete: 'list' object has no attribute 'meta_data' - 2025-06-09 13:39:49,142 - INFO - Successfully cleared documents from the collection (approx. 0 mutations). - 2025-06-09 13:39:49,143 - INFO - Cleared any existing documents from the collection. - - -### 4.3 Vector Store Initialization and Data Loading - -With the Couchbase infrastructure in place, this section prepares the LangChain vector store and populates it with data: -1. **Initialize `BedrockEmbeddings`:** Creates an instance of the `BedrockEmbeddings` client, specifying the `EMBEDDING_MODEL_ID` (e.g., Amazon Titan Text Embeddings V2). This client will be used by the vector store to convert text documents into numerical embeddings for similarity searching. -2. **Initialize `CouchbaseSearchVectorStore`:** Creates an instance of `CouchbaseSearchVectorStore`. This LangChain component acts as an abstraction layer over the Couchbase collection and search index, providing methods for adding documents and performing similarity searches. It's configured with the Couchbase cluster connection, bucket/scope/collection names, the embeddings client, and the search index name. -3. **Load Documents from JSON:** Reads document data from the `DOCS_JSON_PATH` file. This file is expected to contain a list of documents, each with `text` and `metadata` fields. -4. **Add Documents to Vector Store:** If documents are loaded, their texts and metadatas are extracted. The `vector_store.add_texts()` method is then called to process these documents: each document's text is converted into an embedding (using the `BedrockEmbeddings` client), and both the text and its embedding (along with metadata) are stored in the Couchbase collection. The search index (`INDEX_NAME`) is then updated to include these new vectors, making them searchable. -Error handling is included to catch issues like file not found or problems during embedding generation or data insertion. - ->Note: `documents.json` contains the documents that we want to load into our vector store. As an example, we have added a few documents to the file from [https://cline.bot/](https://cline.bot/) -Let's load the documents from the documents.json file and add them to our vector store: - - -```python -try: - logger.info(f"Initializing Bedrock Embeddings client with model: {EMBEDDING_MODEL_ID}") - embeddings = BedrockEmbeddings( - client=bedrock_runtime_client, - model_id=EMBEDDING_MODEL_ID - ) - logger.info("Successfully created Bedrock embeddings client.") - - logger.info(f"Initializing CouchbaseSearchVectorStore with index: {INDEX_NAME}") - vector_store = CouchbaseSearchVectorStore( - cluster=cb_cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME - ) - logger.info("Successfully created Couchbase vector store.") - - # Load documents from JSON file - logger.info(f"Looking for documents at: {DOCS_JSON_PATH}") - if not os.path.exists(DOCS_JSON_PATH): - logger.error(f"Documents file not found: {DOCS_JSON_PATH}") - raise FileNotFoundError(f"Documents file not found: {DOCS_JSON_PATH}") - - with open(DOCS_JSON_PATH, 'r') as f: - data = json.load(f) - documents_to_load = data.get('documents', []) - logger.info(f"Loaded {len(documents_to_load)} documents from {DOCS_JSON_PATH}") - - # Add documents to vector store - if documents_to_load: - logger.info(f"Adding {len(documents_to_load)} documents to vector store...") - texts = [doc.get('text', '') for doc in documents_to_load] - metadatas = [] - for i, doc in enumerate(documents_to_load): - metadata_raw = doc.get('metadata', {}) - if isinstance(metadata_raw, str): - try: - metadata = json.loads(metadata_raw) - if not isinstance(metadata, dict): - logger.warning(f"Metadata for doc {i} parsed from string is not a dict: {metadata}. Using empty dict.") - metadata = {} - except json.JSONDecodeError: - logger.warning(f"Could not parse metadata string for doc {i}: {metadata_raw}. Using empty dict.") - metadata = {} - elif isinstance(metadata_raw, dict): - metadata = metadata_raw - else: - logger.warning(f"Metadata for doc {i} is not a string or dict: {metadata_raw}. Using empty dict.") - metadata = {} - metadatas.append(metadata) - - inserted_ids = vector_store.add_texts(texts=texts, metadatas=metadatas) - logger.info(f"Successfully added {len(inserted_ids)} documents to the vector store.") - else: - logger.warning("No documents found in the JSON file to add.") - -except FileNotFoundError as e: - logger.error(f"Setup failed: {e}") - raise -except Exception as e: - logger.error(f"Error during vector store setup or data loading: {e}") - logger.error(traceback.format_exc()) - raise - -logger.info("--- Couchbase Setup and Data Loading Complete ---") -``` - - 2025-06-09 13:39:49,152 - INFO - Initializing Bedrock Embeddings client with model: amazon.titan-embed-text-v2:0 - 2025-06-09 13:39:49,153 - INFO - Successfully created Bedrock embeddings client. - 2025-06-09 13:39:49,153 - INFO - Initializing CouchbaseSearchVectorStore with index: vector_search_bedrock - 2025-06-09 13:39:52,549 - INFO - Successfully created Couchbase vector store. - 2025-06-09 13:39:52,549 - INFO - Looking for documents at: /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/documents.json - 2025-06-09 13:39:52,551 - INFO - Loaded 7 documents from /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/documents.json - 2025-06-09 13:39:52,551 - INFO - Adding 7 documents to vector store... - 2025-06-09 13:39:56,544 - INFO - Successfully added 7 documents to the vector store. - 2025-06-09 13:39:56,545 - INFO - --- Couchbase Setup and Data Loading Complete --- - - -### 4.4 Create IAM Role - -This step ensures that the necessary IAM (Identity and Access Management) role for the Bedrock Agent and its Lambda function is in place. -- It defines a `agent_role_name` (e.g., `bedrock_agent_lambda_exp_role`). -- It calls the `create_agent_role()` helper function. This function (described in section 3.7) either creates a new IAM role with this name or updates an existing one. -- The role is configured with a trust policy allowing both the Bedrock service and the Lambda service to assume it. -- It attaches necessary permissions policies, including `AWSLambdaBasicExecutionRole` for Lambda logging and custom inline policies for Bedrock access and any other required permissions. -- The AWS Account ID, needed for defining precise resource ARNs in policies, is fetched dynamically using the STS client if not already available as an environment variable. -The ARN of this role (`agent_role_arn`) is stored, as it's a required parameter for creating both the Bedrock Agent and the AWS Lambda function that the agent will invoke. - - -```python -agent_role_name = "bedrock_agent_lambda_exp_role" -try: - # Ensure AWS_ACCOUNT_ID is loaded correctly - if not AWS_ACCOUNT_ID: - logger.info("Attempting to fetch AWS Account ID...") - sts_client = boto3.client('sts', region_name=AWS_REGION) - AWS_ACCOUNT_ID = sts_client.get_caller_identity().get('Account') - if not AWS_ACCOUNT_ID: - raise ValueError("AWS Account ID could not be determined. Please set the AWS_ACCOUNT_ID environment variable.") - logger.info(f"Fetched AWS Account ID: {AWS_ACCOUNT_ID}") - - agent_role_arn = create_agent_role(iam_client, agent_role_name, AWS_ACCOUNT_ID) - logger.info(f"Agent IAM Role ARN: {agent_role_arn}") -except Exception as e: - logger.error(f"Failed to create/verify IAM role: {e}") - logger.error(traceback.format_exc()) - raise -``` - - 2025-06-09 13:39:56,553 - INFO - Checking/Creating IAM role: bedrock_agent_lambda_exp_role - 2025-06-09 13:39:57,454 - INFO - IAM role 'bedrock_agent_lambda_exp_role' already exists with ARN: arn:aws:iam::598307997273:role/bedrock_agent_lambda_exp_role - 2025-06-09 13:39:57,454 - INFO - Updating trust policy for existing role 'bedrock_agent_lambda_exp_role'... - 2025-06-09 13:39:57,710 - INFO - Trust policy updated for role 'bedrock_agent_lambda_exp_role'. - 2025-06-09 13:39:57,710 - INFO - Attaching basic Lambda execution policy to role 'bedrock_agent_lambda_exp_role'... - 2025-06-09 13:39:57,973 - INFO - Attached basic Lambda execution policy. - 2025-06-09 13:39:57,974 - INFO - Putting basic inline policy 'LambdaBasicLoggingPermissions' for role 'bedrock_agent_lambda_exp_role'... - 2025-06-09 13:39:58,240 - INFO - Successfully put inline policy 'LambdaBasicLoggingPermissions'. - 2025-06-09 13:39:58,240 - INFO - Putting Bedrock permissions policy 'BedrockAgentPermissions' for role 'bedrock_agent_lambda_exp_role'... - 2025-06-09 13:39:58,607 - INFO - Successfully put inline policy 'BedrockAgentPermissions'. - 2025-06-09 13:39:58,608 - INFO - Waiting 10s for policy changes to propagate... - 2025-06-09 13:40:08,612 - INFO - Agent IAM Role ARN: arn:aws:iam::598307997273:role/bedrock_agent_lambda_exp_role - - -### 4.5 Deploy Lambda Function - -This section orchestrates the deployment of the AWS Lambda function that will execute the agent's `searchAndFormatDocuments` tool. The process involves several steps managed by the helper functions: -1. **Define Lambda Details:** Specifies the `search_format_lambda_name` (e.g., `bedrock_agent_search_format_exp`), the `lambda_source_dir` (where the Lambda's Python script and `Makefile` are located), and `lambda_build_dir` (where the final .zip package will be placed). -2. **Cleanup Old Lambdas (Optional but Recommended):** Calls `delete_lambda_function` for potentially conflicting older Lambda functions (e.g., separate researcher/writer Lambdas from previous experiments or an old version of the current combined Lambda). This ensures a cleaner environment, especially during iterative development. -3. **Package Lambda:** Calls `package_function()`. This helper (described in 3.8.3) uses the `Makefile` in `lambda_source_dir` to install dependencies, prepare the handler script (`bedrock_agent_search_and_format.py`), and create a .zip deployment package (`search_format_zip_path`). -4. **Create/Update Lambda in AWS:** Calls `create_lambda_function()`. This helper (described in 3.8.4) takes the .zip package and either creates a new Lambda function in AWS or updates an existing one. It handles S3 upload for large packages, sets environment variables (like Couchbase connection info and Bedrock model IDs), configures the IAM role, runtime, handler, timeout, and memory. It also adds permissions for Bedrock to invoke the Lambda and waits for the Lambda to become active. -5. **Cleanup Deployment Package:** After successful deployment, the local .zip file is removed to save space. -The ARN of the deployed Lambda (`search_format_lambda_arn`) is stored, as it's needed to link this Lambda to the Bedrock Agent's action group. - - -```python -search_format_lambda_name = "bedrock_agent_search_format_exp" -# Adjust source/build dirs for notebook context if necessary -lambda_source_dir = os.path.join(SCRIPT_DIR, 'lambda_functions') -lambda_build_dir = SCRIPT_DIR # Final zip ends up in the notebook's directory - -logger.info("--- Starting Lambda Deployment (Single Function) --- ") -search_format_lambda_arn = None -search_format_zip_path = None - -try: - # Delete old lambdas if they exist (optional, but good cleanup) - logger.info("Deleting potentially conflicting old Lambda functions...") - delete_lambda_function(lambda_client, "bedrock_agent_researcher_exp") - delete_lambda_function(lambda_client, "bedrock_agent_writer_exp") - # Delete the new lambda if it exists from a previous run - delete_lambda_function(lambda_client, search_format_lambda_name) - logger.info("Old Lambda deletion checks complete.") - - logger.info(f"Packaging Lambda function '{search_format_lambda_name}'...") - search_format_zip_path = package_function("bedrock_agent_search_and_format", lambda_source_dir, lambda_build_dir) - logger.info(f"Lambda function packaged at: {search_format_zip_path}") - - logger.info(f"Creating/Updating Lambda function '{search_format_lambda_name}'...") - search_format_lambda_arn = create_lambda_function( - lambda_client=lambda_client, function_name=search_format_lambda_name, - handler='lambda_function.lambda_handler', role_arn=agent_role_arn, - zip_file=search_format_zip_path, region=AWS_REGION - ) - logger.info(f"Search/Format Lambda Deployed: {search_format_lambda_arn}") - -except FileNotFoundError as e: - logger.error(f"Lambda packaging failed: Required file not found. {e}") - raise -except Exception as e: - logger.error(f"Lambda deployment failed: {e}") - logger.error(traceback.format_exc()) - raise -finally: - logger.info("Cleaning up deployment zip file...") - if search_format_zip_path and os.path.exists(search_format_zip_path): - try: - os.remove(search_format_zip_path) - logger.info(f"Removed zip file: {search_format_zip_path}") - except OSError as e: - logger.warning(f"Could not remove zip file {search_format_zip_path}: {e}") - -logger.info("--- Lambda Deployment Complete --- ") -``` - - 2025-06-09 13:40:08,624 - INFO - --- Starting Lambda Deployment (Single Function) --- - 2025-06-09 13:40:08,626 - INFO - Deleting potentially conflicting old Lambda functions... - 2025-06-09 13:40:08,626 - INFO - Attempting to delete Lambda function: bedrock_agent_researcher_exp... - 2025-06-09 13:40:08,626 - INFO - Attempting to remove permission AllowBedrockInvokeBasic-bedrock_agent_researcher_exp from bedrock_agent_researcher_exp... - 2025-06-09 13:40:09,490 - INFO - Permission AllowBedrockInvokeBasic-bedrock_agent_researcher_exp not found on bedrock_agent_researcher_exp. Skipping removal. - 2025-06-09 13:40:09,794 - INFO - Lambda function 'bedrock_agent_researcher_exp' does not exist. No need to delete. - 2025-06-09 13:40:09,795 - INFO - Attempting to delete Lambda function: bedrock_agent_writer_exp... - 2025-06-09 13:40:09,796 - INFO - Attempting to remove permission AllowBedrockInvokeBasic-bedrock_agent_writer_exp from bedrock_agent_writer_exp... - 2025-06-09 13:40:10,082 - INFO - Permission AllowBedrockInvokeBasic-bedrock_agent_writer_exp not found on bedrock_agent_writer_exp. Skipping removal. - 2025-06-09 13:40:10,387 - INFO - Lambda function 'bedrock_agent_writer_exp' does not exist. No need to delete. - 2025-06-09 13:40:10,387 - INFO - Attempting to delete Lambda function: bedrock_agent_search_format_exp... - 2025-06-09 13:40:10,387 - INFO - Attempting to remove permission AllowBedrockInvokeBasic-bedrock_agent_search_format_exp from bedrock_agent_search_format_exp... - 2025-06-09 13:40:10,686 - INFO - Successfully removed permission AllowBedrockInvokeBasic-bedrock_agent_search_format_exp from bedrock_agent_search_format_exp. - 2025-06-09 13:40:13,060 - INFO - Function bedrock_agent_search_format_exp exists. Deleting... - 2025-06-09 13:40:13,594 - INFO - Waiting for bedrock_agent_search_format_exp to be deleted... - 2025-06-09 13:40:23,596 - INFO - Function bedrock_agent_search_format_exp deletion initiated. - 2025-06-09 13:40:23,597 - INFO - Old Lambda deletion checks complete. - 2025-06-09 13:40:23,598 - INFO - Packaging Lambda function 'bedrock_agent_search_format_exp'... - 2025-06-09 13:40:23,599 - INFO - --- Packaging function bedrock_agent_search_and_format --- - 2025-06-09 13:40:23,602 - INFO - Source Dir (Makefile location & make cwd): /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/lambda_functions - 2025-06-09 13:40:23,602 - INFO - Build Dir (Final zip location): /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach - 2025-06-09 13:40:23,603 - INFO - Copying /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/lambda_functions/bedrock_agent_search_and_format.py to /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/lambda_functions/lambda_function.py - 2025-06-09 13:40:23,605 - INFO - Running make command: make -f /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/lambda_functions/Makefile clean package (in /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/lambda_functions) - 2025-06-09 13:40:50,341 - INFO - Make command completed successfully. - 2025-06-09 13:40:50,343 - INFO - Moving and renaming /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/lambda_functions/lambda_package.zip to /Users/kaustavghosh/Desktop/vector-search-cookbook/awsbedrock-agents/lambda-approach/bedrock_agent_search_and_format.zip - - ... (output truncated for brevity) - - -### 4.6 Agent Setup - -This part of the script focuses on creating the Bedrock Agent itself. -1. **Define Agent Name:** An `agent_name` is defined (e.g., `couchbase_search_format_agent_exp`). -2. **Cleanup Existing Agent (Idempotency):** It calls `delete_agent_and_resources()` first. This helper function (described in 3.9.3) attempts to find an agent with the same name and, if found, deletes it along with its action groups and aliases. This ensures that each run starts with a clean slate for the agent, preventing conflicts or issues from previous configurations. -3. **Create New Agent:** After the cleanup attempt, it calls `create_agent()`. This helper function (described in 3.10.1) creates a new Bedrock Agent with the specified name, the IAM role ARN (`agent_role_arn`), the foundation model ID (`AGENT_MODEL_ID`), and a set of instructions guiding the agent on how to behave and use its tools. -The `agent_id` and `agent_arn` returned by `create_agent()` are stored for subsequent steps like creating action groups and preparing the agent. - - -```python -agent_name = f"couchbase_search_format_agent_exp" -agent_id = None -agent_arn = None -alias_name = "prod" # Define alias name here -# agent_alias_id_to_use will be set later after preparation - -# 1. Attempt to find and delete existing agent to ensure a clean state -logger.info(f"Checking for and deleting existing agent: {agent_name}") -try: - delete_agent_and_resources(bedrock_agent_client, agent_name) # Handles finding and deleting - logger.info(f"Deletion process completed for any existing agent named {agent_name}.") -except Exception as e: - # Log error during find/delete but proceed to creation attempt - logger.error(f"Error during agent finding/deletion phase: {e}. Proceeding to creation attempt.") - -# 2. Always attempt to create the agent after the delete phase -logger.info(f"--- Creating Agent: {agent_name} ---") -try: - agent_id, agent_arn = create_agent( - agent_client=bedrock_agent_client, - agent_name=agent_name, - agent_role_arn=agent_role_arn, - foundation_model_id=AGENT_MODEL_ID - ) - if not agent_id: - raise Exception("create_agent function did not return a valid agent ID.") - logger.info(f"Agent created successfully. ID: {agent_id}, ARN: {agent_arn}") -except Exception as e: - logger.error(f"Failed to create agent '{agent_name}': {e}") - logger.error(traceback.format_exc()) - raise -``` - - 2025-06-09 13:41:12,317 - INFO - Checking for and deleting existing agent: couchbase_search_format_agent_exp - 2025-06-09 13:41:12,318 - INFO - Attempting to find agent by name: couchbase_search_format_agent_exp - 2025-06-09 13:41:13,172 - INFO - Found agent 'couchbase_search_format_agent_exp' with ID: 8CZXA8LJJH - 2025-06-09 13:41:13,172 - WARNING - --- Deleting Agent Resources for 'couchbase_search_format_agent_exp' (ID: 8CZXA8LJJH) --- - 2025-06-09 13:41:13,172 - INFO - Listing action groups for agent 8CZXA8LJJH... - 2025-06-09 13:41:13,472 - INFO - Found 1 action groups to delete. - 2025-06-09 13:41:13,473 - INFO - Attempting to delete action group GKWWTGZVHJ for agent 8CZXA8LJJH... - 2025-06-09 13:41:13,794 - INFO - Successfully deleted action group GKWWTGZVHJ for agent 8CZXA8LJJH. - 2025-06-09 13:41:18,797 - INFO - Attempting to delete agent 8CZXA8LJJH ('couchbase_search_format_agent_exp')... - 2025-06-09 13:41:19,108 - INFO - Waiting up to 2 minutes for agent 8CZXA8LJJH deletion... - 2025-06-09 13:41:25,109 - INFO - Agent 8CZXA8LJJH successfully deleted. - 2025-06-09 13:41:25,110 - INFO - --- Agent Resource Deletion Complete for 'couchbase_search_format_agent_exp' --- - 2025-06-09 13:41:25,111 - INFO - Deletion process completed for any existing agent named couchbase_search_format_agent_exp. - 2025-06-09 13:41:25,111 - INFO - --- Creating Agent: couchbase_search_format_agent_exp --- - 2025-06-09 13:41:25,112 - INFO - --- Creating Agent: couchbase_search_format_agent_exp --- - 2025-06-09 13:41:25,623 - INFO - Agent creation initiated. Name: couchbase_search_format_agent_exp, ID: 7BTR61MXVF, ARN: arn:aws:bedrock:us-east-1:598307997273:agent/7BTR61MXVF, Status: CREATING - 2025-06-09 13:41:25,625 - INFO - Waiting for agent 7BTR61MXVF to reach initial state... - 2025-06-09 13:41:26,201 - INFO - Agent 7BTR61MXVF status: CREATING - 2025-06-09 13:41:31,658 - INFO - Agent 7BTR61MXVF status: NOT_PREPARED - 2025-06-09 13:41:32,110 - INFO - Agent 7BTR61MXVF successfully created (Status: NOT_PREPARED). - 2025-06-09 13:41:32,111 - INFO - Agent created successfully. ID: 7BTR61MXVF, ARN: arn:aws:bedrock:us-east-1:598307997273:agent/7BTR61MXVF - - -### 4.7 Action Group Setup - -Once the agent is created and the Lambda function is deployed, this step links them together by creating an Action Group. -- It defines an `action_group_name` (e.g., `SearchAndFormatActionGroup`). -- It calls the `create_action_group()` helper function (described in 3.10.2). This function is responsible for: - - Taking the `agent_id` and the `search_format_lambda_arn` (the ARN of the deployed Lambda function) as input. - - Defining the `functionSchema` which tells the agent how to use the Lambda function (i.e., the tool name `searchAndFormatDocuments` and its parameters like `query`, `k`, `style`). - - Setting the `actionGroupExecutor` to point to the Lambda ARN, so Bedrock knows which Lambda to invoke. - - Creating a new action group or updating an existing one with the same name for the `DRAFT` version of the agent. -- A 30-second pause (`time.sleep(30)`) is added after the action group setup. This is a crucial step to give AWS services enough time to propagate the changes and ensure that the agent is aware of the newly configured or updated action group before proceeding to the preparation phase. Without such a delay, the preparation step might fail or not correctly incorporate the action group. - - -```python -# --- Action Group Creation/Update (Now assumes agent_id is valid) --- -action_group_name = "SearchAndFormatActionGroup" -action_group_id = None -try: - if not agent_id: - raise ValueError("Agent ID is not set. Cannot create action group.") - if not search_format_lambda_arn: - raise ValueError("Lambda ARN is not set. Cannot create action group.") - - logger.info(f"Creating/Updating Action Group '{action_group_name}' for agent {agent_id}...") - action_group_id = create_action_group( - agent_client=bedrock_agent_client, - agent_id=agent_id, - action_group_name=action_group_name, - function_arn=search_format_lambda_arn, - # schema_path=None # No longer needed explicitly if default is None - ) - if not action_group_id: - raise Exception("create_action_group did not return a valid ID.") - logger.info(f"Action Group '{action_group_name}' created/updated with ID: {action_group_id}") - - # Add a slightly longer wait after action group modification/creation - logger.info("Waiting 30s after action group setup before preparing agent...") - time.sleep(30) -except Exception as e: - logger.error(f"Failed to set up action group: {e}") - logger.error(traceback.format_exc()) - raise -``` - - 2025-06-09 13:41:32,119 - INFO - Creating/Updating Action Group 'SearchAndFormatActionGroup' for agent 7BTR61MXVF... - 2025-06-09 13:41:32,120 - INFO - --- Creating/Updating Action Group (Function Details): SearchAndFormatActionGroup for Agent: 7BTR61MXVF --- - 2025-06-09 13:41:32,121 - INFO - Lambda ARN: arn:aws:lambda:us-east-1:598307997273:function:bedrock_agent_search_format_exp - 2025-06-09 13:41:32,122 - INFO - Checking if action group 'SearchAndFormatActionGroup' already exists for agent 7BTR61MXVF DRAFT version... - 2025-06-09 13:41:32,412 - INFO - Action group 'SearchAndFormatActionGroup' does not exist. Creating new with Function Details. - 2025-06-09 13:41:32,806 - INFO - Successfully created Action Group 'SearchAndFormatActionGroup' with ID: 7XTTI9XFOX using Function Details. - 2025-06-09 13:41:37,812 - INFO - Action Group 'SearchAndFormatActionGroup' created/updated with ID: 7XTTI9XFOX - 2025-06-09 13:41:37,812 - INFO - Waiting 30s after action group setup before preparing agent... - - -### 4.8 Prepare Agent and Handle Alias - -After the agent and its action group (linking to the Lambda tool) are defined, this section makes the agent ready for use and assigns an alias to it: -1. **Prepare Agent:** It calls the `prepare_agent()` helper function (described in 3.10.3). This function initiates the preparation process for the `DRAFT` version of the agent and uses a custom waiter to wait until the agent's status becomes `PREPARED`. This step is vital as it compiles all agent configurations. -2. **Alias Handling (Create or Update):** Once the agent is successfully prepared: - - An `alias_name` (e.g., `prod`) is defined. - - The code checks if an alias with this name already exists for the agent using `list_agent_aliases`. - - If the alias exists, its ID (`agent_alias_id_to_use`) is retrieved. The notebook assumes the existing alias will correctly point to the latest prepared version (DRAFT) or could be updated if necessary (though direct update logic for the alias to point to a specific version isn't explicitly shown here beyond creation). - - If the alias does not exist, `create_agent_alias()` is called. This creates a new alias that, by default, points to the latest prepared version of the agent (which is the `DRAFT` version that was just prepared). - - A brief pause (`time.sleep(10)`) is added to allow the alias changes to propagate. -The `agent_alias_id_to_use` is now ready for invoking the agent. - - -```python -agent_alias_id_to_use = None # Initialize alias ID -alias_name = "prod" # Make sure alias_name is defined -if agent_id: - logger.info(f"--- Preparing Agent: {agent_id} ---") - preparation_successful = False - try: - # prepare_agent now ONLY prepares, doesn't handle alias or return its ID - prepare_agent(bedrock_agent_client, agent_id) - logger.info(f"Agent {agent_id} preparation seems complete (waiter succeeded).") - preparation_successful = True # Flag success - - except Exception as e: # Catch errors from preparation - logger.error(f"Error during agent preparation for {agent_id}: {e}") - logger.error(traceback.format_exc()) - raise - - # --- Alias Handling (runs only if preparation succeeded) --- - if preparation_successful: - logger.info(f"--- Setting up Alias '{alias_name}' for Agent {agent_id} ---") # Add log - try: - # --- Alias Creation/Update Logic (Copied/adapted from main.py's __main__) --- - logger.info(f"Checking for alias '{alias_name}' for agent {agent_id}...") - existing_alias = None - paginator = bedrock_agent_client.get_paginator('list_agent_aliases') - for page in paginator.paginate(agentId=agent_id): - for alias_summary in page.get('agentAliasSummaries', []): - if alias_summary.get('agentAliasName') == alias_name: - existing_alias = alias_summary - break - if existing_alias: - break - - if existing_alias: - agent_alias_id_to_use = existing_alias['agentAliasId'] - logger.info(f"Using existing alias '{alias_name}' with ID: {agent_alias_id_to_use}.") - # Optional: Update alias to point to DRAFT if needed, - # but create_agent_alias defaults to latest prepared (DRAFT) so just checking existence is often enough. - else: - logger.info(f"Alias '{alias_name}' not found. Creating new alias...") - create_alias_response = bedrock_agent_client.create_agent_alias( - agentId=agent_id, - agentAliasName=alias_name - # routingConfiguration removed - defaults to latest prepared (DRAFT) - ) - agent_alias_id_to_use = create_alias_response.get('agentAlias', {}).get('agentAliasId') - logger.info(f"Successfully created alias '{alias_name}' with ID: {agent_alias_id_to_use}. (Defaults to latest prepared version - DRAFT)") - - if not agent_alias_id_to_use: - raise ValueError(f"Failed to get a valid alias ID for '{alias_name}'") - - logger.info(f"Waiting 10s for alias '{alias_name}' changes to propagate...") - time.sleep(10) - logger.info(f"Agent {agent_id} preparation and alias '{alias_name}' ({agent_alias_id_to_use}) setup complete.") - - - except Exception as alias_e: # Catch errors from alias logic - logger.error(f"Failed to create/update alias '{alias_name}' for agent {agent_id}: {alias_e}") - logger.error(traceback.format_exc()) - raise -else: - logger.error("Agent ID not available, skipping preparation and alias setup.") -``` - - 2025-06-09 13:42:07,835 - INFO - --- Preparing Agent: 7BTR61MXVF --- - 2025-06-09 13:42:07,836 - INFO - --- Preparing Agent: 7BTR61MXVF --- - 2025-06-09 13:42:08,338 - INFO - Agent preparation initiated for version 'DRAFT'. Status: PREPARING. Prepared At: 2025-06-09 08:12:08.237735+00:00 - 2025-06-09 13:42:08,338 - INFO - Waiting for agent 7BTR61MXVF preparation to complete (up to 10 minutes)... - 2025-06-09 13:42:39,237 - INFO - Agent 7BTR61MXVF successfully prepared. - 2025-06-09 13:42:39,238 - INFO - Agent 7BTR61MXVF preparation seems complete (waiter succeeded). - 2025-06-09 13:42:39,238 - INFO - --- Setting up Alias 'prod' for Agent 7BTR61MXVF --- - 2025-06-09 13:42:39,239 - INFO - Checking for alias 'prod' for agent 7BTR61MXVF... - 2025-06-09 13:42:39,526 - INFO - Alias 'prod' not found. Creating new alias... - 2025-06-09 13:42:39,902 - INFO - Successfully created alias 'prod' with ID: Y8YNYUDFFZ. (Defaults to latest prepared version - DRAFT) - 2025-06-09 13:42:39,903 - INFO - Waiting 10s for alias 'prod' changes to propagate... - 2025-06-09 13:42:49,907 - INFO - Agent 7BTR61MXVF preparation and alias 'prod' (Y8YNYUDFFZ) setup complete. - - -### 4.9 Test Agent Invocation - -This is the final operational step where the fully configured Bedrock Agent is tested. -- It first checks if both `agent_id` and `agent_alias_id_to_use` are available (i.e., the previous setup steps were successful). -- A unique `session_id` is generated for this specific interaction. -- A `test_prompt` is defined (e.g., "Search for information about Project Chimera and format the results using bullet points."). This prompt is designed to trigger the agent's tool (`searchAndFormatDocuments`). -- It then calls the `test_agent_invocation()` helper function (described in 3.11.1). This function sends the prompt to the Bedrock Agent Runtime using the specified agent ID and alias ID. -- The `test_agent_invocation` function handles the streaming response from the agent, concatenates the text chunks, logs trace information for debugging, and prints the agent's final completion. -This step demonstrates an end-to-end test of the agent: receiving a prompt, deciding to use its Lambda-backed tool, Bedrock invoking the Lambda, the Lambda executing (performing search and formatting), returning results to the agent, and the agent formulating a final response to the user. - - -```python -# --- Test Invocation --- -# Agent ID and custom alias ID should be valid here -if agent_id and agent_alias_id_to_use: # Check both are set - session_id = str(uuid.uuid4()) - test_prompt = "Search for information about Project Chimera and format the results using bullet points." - logger.info(f"--- Invoking Agent {agent_id} using Alias '{alias_name}' ({agent_alias_id_to_use}) ---") # Updated log - try: - completion = test_agent_invocation( - agent_runtime_client=bedrock_agent_runtime_client, - agent_id=agent_id, - agent_alias_id=agent_alias_id_to_use, - session_id=session_id, - prompt=test_prompt - ) - if completion is None: - logger.error("Agent invocation failed.") - except Exception as e: - logger.error(f"Error during test invocation: {e}") - logger.error(traceback.format_exc()) -else: - logger.error("Agent ID or Alias ID not available, skipping invocation test.") -``` - - 2025-06-09 13:42:49,923 - INFO - --- Invoking Agent 7BTR61MXVF using Alias 'prod' (Y8YNYUDFFZ) --- - 2025-06-09 13:42:49,924 - INFO - --- Testing Agent Invocation (Agent ID: 7BTR61MXVF, Alias: Y8YNYUDFFZ) --- - 2025-06-09 13:42:49,925 - INFO - Session ID: 6529a5a7-0b58-4c7d-8682-20353a8f09c3 - 2025-06-09 13:42:49,925 - INFO - Prompt: "Search for information about Project Chimera and format the results using bullet points." - 2025-06-09 13:42:50,894 - INFO - Agent invocation successful. Processing response... - 2025-06-09 13:43:05,971 - INFO - --- Agent Final Response ---• Project Chimera combines quantum entanglement communication with neural networks for secure, real-time data analysis across distributed nodes. Lead developer: Dr. Aris Thorne. - - • Chimera operates in two modes: - - 'Quantum Sync' for high-fidelity data transfer - - 'Neural Inference' for localized edge processing based on the synced data. - - • A key aspect of Chimera is its "Ephemeral Key Protocol" (EKP), which generates one-time quantum keys for each transmission, ensuring absolute forward secrecy. - 2025-06-09 13:43:05,973 - INFO - --- Invocation Trace Summary --- - 2025-06-09 13:43:05,975 - INFO - Trace 1: Type=None, Step=None - 2025-06-09 13:43:05,975 - INFO - Trace 2: Type=None, Step=None - 2025-06-09 13:43:05,976 - INFO - Trace 3: Type=None, Step=None - 2025-06-09 13:43:05,976 - INFO - Trace 4: Type=None, Step=None - 2025-06-09 13:43:05,977 - INFO - Trace 5: Type=None, Step=None - 2025-06-09 13:43:05,977 - INFO - Trace 6: Type=None, Step=None - 2025-06-09 13:43:05,978 - INFO - Trace 7: Type=None, Step=None - 2025-06-09 13:43:05,978 - INFO - Trace 8: Type=None, Step=None - - -## Conclusion - -In this notebook, we've demonstrated the Lambda approach for implementing AWS Bedrock agents with Couchbase Vector Search. This approach allows the agent to invoke AWS Lambda functions to execute operations, providing better scalability and separation of concerns. - -Key components of this implementation include: - -1. **Vector Store Setup**: We set up a Couchbase vector store to store and search documents using semantic similarity. -2. **Lambda Function Deployment**: We deployed Lambda functions that handle the agent's function calls. -3. **Agent Creation**: We created two specialized agents - a researcher agent for searching documents and a writer agent for formatting results. -4. **Lambda Integration**: We integrated the agents with Lambda functions, allowing them to execute operations in a serverless environment. - -This approach is particularly useful for production environments where scalability and separation of concerns are important. The Lambda functions can be deployed independently and can access other AWS services, providing more flexibility and power. diff --git a/tutorial/markdown/generated/vector-search-cookbook/CouchbaseStorage_Demo.md b/tutorial/markdown/generated/vector-search-cookbook/CouchbaseStorage_Demo.md deleted file mode 100644 index 9702cc0..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/CouchbaseStorage_Demo.md +++ /dev/null @@ -1,1082 +0,0 @@ ---- -# frontmatter -path: "/tutorial-crewai-short-term-memory-couchbase-with-global-secondary-index" -title: Implementing Short-Term Memory for CrewAI Agents with Couchbase with GSI -short_title: CrewAI Short-Term Memory with Couchbase with GSI -description: - - Learn how to implement short-term memory for CrewAI agents using Couchbase's vector search capabilities with GSI. - - This tutorial demonstrates how to store and retrieve agent interactions using semantic search. - - You'll understand how to enhance CrewAI agents with memory capabilities using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - CrewAI -sdk_language: - - python -length: 45 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/crewai-short-term-memory/gsi/CouchbaseStorage_Demo.ipynb) - -# CrewAI Short-Term Memory with Couchbase GSI Vector Search - -## Overview - -This tutorial shows how to implement a custom memory backend for CrewAI agents using Couchbase's high-performance GSI (Global Secondary Index) vector search. CrewAI agents can retain and recall information across interactions, making them more contextually aware and effective. We'll demonstrate measurable performance improvements with GSI optimization. - -**Key Features:** -- Custom CrewAI memory storage with Couchbase GSI vector search -- High-performance semantic memory retrieval -- Agent memory persistence across conversations -- Performance benchmarks showing GSI benefits - -**Requirements:** Couchbase Server 8.0+ or Capella with Query Service enabled. - -## Prerequisites - -### Couchbase Setup - -1. **Create Capella Account:** Deploy a [free tier cluster](https://cloud.couchbase.com/sign-up) -2. **Enable Query Service:** Required for GSI vector search -3. **Configure Access:** Set up database credentials and network security -4. **Create Bucket:** Manual bucket creation recommended for Capella - -## Understanding Agent Memory - -### Why Memory Matters for AI Agents - -Memory in AI agents is a crucial capability that allows them to retain and utilize information across interactions, making them more effective and contextually aware. Without memory, agents would be limited to processing only the immediate input, lacking the ability to build upon past experiences or maintain continuity in conversations. - -#### Types of Memory in AI Agents - -**Short-term Memory:** -- Retains recent interactions and context -- Typically spans the current conversation or session -- Helps maintain coherence within a single interaction flow -- In CrewAI, this is what we're implementing with the Couchbase storage - -**Long-term Memory:** -- Stores persistent knowledge across multiple sessions -- Enables agents to recall past interactions even after long periods -- Helps build cumulative knowledge about users, preferences, and past decisions -- While this implementation is labeled as "short-term memory", the Couchbase storage backend can be effectively used for long-term memory as well, thanks to Couchbase's persistent storage capabilities and enterprise-grade durability features - -#### How Memory Works in Agents - -Memory in AI agents typically involves: -- **Storage**: Information is encoded and stored in a database (like Couchbase, ChromaDB, or other vector stores) -- **Retrieval**: Relevant memories are fetched based on semantic similarity to current context -- **Integration**: Retrieved memories are incorporated into the agent's reasoning process - -The vector-based approach (using embeddings) is particularly powerful because it allows for semantic search - finding memories that are conceptually related to the current context, not just exact keyword matches. - -#### Benefits of Memory in AI Agents - -- **Contextual Understanding**: Agents can refer to previous parts of a conversation -- **Personalization**: Remembering user preferences and past interactions -- **Learning and Adaptation**: Building knowledge over time to improve responses -- **Task Continuity**: Resuming complex tasks across multiple interactions -- **Collaboration**: In multi-agent systems like CrewAI, memory enables agents to build on each other's work - -#### Memory in CrewAI Specifically - -In CrewAI, memory serves several important functions: -- **Agent Specialization**: Each agent can maintain its own memory relevant to its expertise -- **Knowledge Transfer**: Agents can share insights through memory when collaborating on tasks -- **Process Continuity**: In sequential processes, later agents can access the work of earlier agents -- **Contextual Awareness**: Agents can reference previous findings when making decisions - -## Setup and Installation - -### Install Required Libraries - -Install the necessary packages for CrewAI, Couchbase integration, and OpenAI embeddings. - - -```python -%pip install --quiet crewai==0.186.1 langchain-couchbase==0.5.0 langchain-openai==0.3.33 python-dotenv==1.1.1 -``` - - Note: you may need to restart the kernel to use updated packages. - - -### Import Required Modules - -Import libraries for CrewAI memory storage, Couchbase GSI vector search, and OpenAI embeddings. - - -```python -from typing import Any, Dict, List, Optional -import os -import logging -from datetime import timedelta -from dotenv import load_dotenv -from crewai.memory.storage.rag_storage import RAGStorage -from crewai.memory.short_term.short_term_memory import ShortTermMemory -from crewai import Agent, Crew, Task, Process -from couchbase.cluster import Cluster -from couchbase.options import ClusterOptions -from couchbase.auth import PasswordAuthenticator -from couchbase.diagnostics import PingState, ServiceType -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_couchbase.vectorstores import IndexType -from langchain_openai import OpenAIEmbeddings, ChatOpenAI -import time -import json -import uuid - -# Configure logging (disabled) -logging.basicConfig(level=logging.CRITICAL) -logger = logging.getLogger(__name__) -``` - -### Environment Configuration - -Configure environment variables for secure access to Couchbase and OpenAI services. Create a `.env` file with your credentials. - - -```python -load_dotenv("./.env") - -# Verify environment variables -required_vars = ['OPENAI_API_KEY', 'CB_HOST', 'CB_USERNAME', 'CB_PASSWORD'] -for var in required_vars: - if not os.getenv(var): - raise ValueError(f"{var} environment variable is required") -``` - -## Understanding GSI Vector Search - -### GSI Vector Index Types - -Couchbase offers two types of GSI vector indexes for different use cases: - -**Hyperscale Vector Indexes (BHIVE):** -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -**Composite Vector Indexes:** -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -For this CrewAI memory implementation, we'll use **BHIVE** as it's optimized for pure semantic search scenarios typical in AI agent memory systems. - -### Understanding Index Configuration - -The `index_description` parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -**Format**: `'IVF[],{PQ|SQ}'` - -**Centroids (IVF - Inverted File):** -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -**Quantization Options:** -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -**Common Examples:** -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -## Custom CouchbaseStorage Implementation - -### CouchbaseStorage Class - -This class extends CrewAI's `RAGStorage` to provide GSI vector search capabilities for agent memory. - - -```python -class CouchbaseStorage(RAGStorage): - """ - Extends RAGStorage to handle embeddings for memory entries using Couchbase GSI Vector Search. - """ - - def __init__(self, type: str, allow_reset: bool = True, embedder_config: Optional[Dict[str, Any]] = None, crew: Optional[Any] = None): - """Initialize CouchbaseStorage with GSI vector search configuration.""" - super().__init__(type, allow_reset, embedder_config, crew) - self._initialize_app() - - def search( - self, - query: str, - limit: int = 3, - filter: Optional[dict] = None, - score_threshold: float = 0, - ) -> List[Dict[str, Any]]: - """ - Search memory entries using GSI vector similarity. - """ - try: - # Add type filter - search_filter = {"memory_type": self.type} - if filter: - search_filter.update(filter) - - # Execute search using GSI vector search - results = self.vector_store.similarity_search_with_score( - query, - k=limit, - filter=search_filter - ) - - # Format results and deduplicate by content - seen_contents = set() - formatted_results = [] - - for i, (doc, distance) in enumerate(results): - # Note: In GSI vector search, lower distance indicates higher similarity - if distance <= (1.0 - score_threshold): # Convert threshold for GSI distance metric - content = doc.page_content - if content not in seen_contents: - seen_contents.add(content) - formatted_results.append({ - "id": doc.metadata.get("memory_id", str(i)), - "metadata": doc.metadata, - "context": content, - "distance": float(distance) # Changed from score to distance - }) - - logger.info(f"Found {len(formatted_results)} unique results for query: {query}") - return formatted_results - - except Exception as e: - logger.error(f"Search failed: {str(e)}") - return [] - - def save(self, value: Any, metadata: Dict[str, Any]) -> None: - """ - Save a memory entry with metadata. - """ - try: - # Generate unique ID - memory_id = str(uuid.uuid4()) - timestamp = int(time.time() * 1000) - - # Prepare metadata (create a copy to avoid modifying references) - if not metadata: - metadata = {} - else: - metadata = metadata.copy() # Create a copy to avoid modifying references - - # Process agent-specific information if present - agent_name = metadata.get('agent', 'unknown') - - # Clean up value if it has the typical LLM response format - value_str = str(value) - if "Final Answer:" in value_str: - # Extract just the actual content - everything after "Final Answer:" - parts = value_str.split("Final Answer:", 1) - if len(parts) > 1: - value = parts[1].strip() - logger.info(f"Cleaned up response format for agent: {agent_name}") - elif value_str.startswith("Thought:"): - # Handle thought/final answer format - if "Final Answer:" in value_str: - parts = value_str.split("Final Answer:", 1) - if len(parts) > 1: - value = parts[1].strip() - logger.info(f"Cleaned up thought process format for agent: {agent_name}") - - # Update metadata - metadata.update({ - "memory_id": memory_id, - "memory_type": self.type, - "timestamp": timestamp, - "source": "crewai" - }) - - # Log memory information for debugging - value_preview = str(value)[:100] + "..." if len(str(value)) > 100 else str(value) - metadata_preview = {k: v for k, v in metadata.items() if k != "embedding"} - logger.info(f"Saving memory for Agent: {agent_name}") - logger.info(f"Memory value preview: {value_preview}") - logger.info(f"Memory metadata: {metadata_preview}") - - # Convert value to string if needed - if isinstance(value, (dict, list)): - value = json.dumps(value) - elif not isinstance(value, str): - value = str(value) - - # Save to GSI vector store - self.vector_store.add_texts( - texts=[value], - metadatas=[metadata], - ids=[memory_id] - ) - logger.info(f"Saved memory {memory_id}: {value[:100]}...") - - except Exception as e: - logger.error(f"Save failed: {str(e)}") - raise - - def reset(self) -> None: - """Reset the memory storage if allowed.""" - if not self.allow_reset: - return - - try: - # Delete documents of this memory type - self.cluster.query( - f"DELETE FROM `{self.bucket_name}`.`{self.scope_name}`.`{self.collection_name}` WHERE memory_type = $type", - type=self.type - ).execute() - logger.info(f"Reset memory type: {self.type}") - except Exception as e: - logger.error(f"Reset failed: {str(e)}") - raise - - def _initialize_app(self): - """Initialize Couchbase connection and GSI vector store.""" - try: - # Initialize embeddings - if self.embedder_config and self.embedder_config.get("provider") == "openai": - self.embeddings = OpenAIEmbeddings( - openai_api_key=os.getenv('OPENAI_API_KEY'), - model=self.embedder_config.get("config", {}).get("model", "text-embedding-3-small") - ) - else: - self.embeddings = OpenAIEmbeddings( - openai_api_key=os.getenv('OPENAI_API_KEY'), - model="text-embedding-3-small" - ) - - # Connect to Couchbase - auth = PasswordAuthenticator( - os.getenv('CB_USERNAME', ''), - os.getenv('CB_PASSWORD', '') - ) - options = ClusterOptions(auth) - - # Initialize cluster connection - self.cluster = Cluster(os.getenv('CB_HOST', ''), options) - self.cluster.wait_until_ready(timedelta(seconds=5)) - - # Check Query service (required for GSI vector search) - ping_result = self.cluster.ping() - query_available = False - for service_type, endpoints in ping_result.endpoints.items(): - if service_type.name == 'Query': # Query Service for GSI - for endpoint in endpoints: - if endpoint.state == PingState.OK: - query_available = True - logger.info(f"Query service is responding at: {endpoint.remote}") - break - break - if not query_available: - raise RuntimeError("Query service not found or not responding. GSI vector search requires Query Service.") - - # Set up storage configuration - self.bucket_name = os.getenv('CB_BUCKET_NAME', 'vector-search-testing') - self.scope_name = os.getenv('SCOPE_NAME', 'shared') - self.collection_name = os.getenv('COLLECTION_NAME', 'crew') - self.index_name = os.getenv('INDEX_NAME', 'vector_search_crew_gsi') - - # Initialize GSI vector store - self.vector_store = CouchbaseQueryVectorStore( - cluster=self.cluster, - bucket_name=self.bucket_name, - scope_name=self.scope_name, - collection_name=self.collection_name, - embedding=self.embeddings, - distance_metric=DistanceStrategy.COSINE, - ) - logger.info(f"Initialized CouchbaseStorage with GSI vector search for type: {self.type}") - - except Exception as e: - logger.error(f"Initialization failed: {str(e)}") - raise -``` - -## Memory Search Performance Testing - -Now let's demonstrate the performance benefits of GSI optimization by testing pure memory search performance. We'll compare three optimization levels: - -1. **Baseline Performance**: Memory search without GSI optimization -2. **GSI-Optimized Performance**: Same search with BHIVE GSI index -3. **Cache Benefits**: Show how caching can be applied on top of GSI for repeated queries - -**Important**: This testing focuses on pure memory search performance, isolating the GSI improvements from CrewAI agent workflow overhead. - -### Initialize Storage and Test Functions - -First, let's set up the storage and create test functions for measuring memory search performance. - - -```python -# Initialize storage -storage = CouchbaseStorage( - type="short_term", - embedder_config={ - "provider": "openai", - "config": {"model": "text-embedding-3-small"} - } -) - -# Reset storage -storage.reset() - -# Test storage -test_memory = "Pep Guardiola praised Manchester City's current form, saying 'The team is playing well, we are in a good moment. The way we are training, the way we are playing - I am really pleased.'" -test_metadata = {"category": "sports", "test": "initial_memory"} -storage.save(test_memory, test_metadata) - -import time - -def test_memory_search_performance(storage, query, label="Memory Search"): - """Test pure memory search performance and return timing metrics""" - print(f"\n[{label}] Testing memory search performance") - print(f"[{label}] Query: '{query}'") - - start_time = time.time() - - try: - results = storage.search(query, limit=3) - end_time = time.time() - search_time = end_time - start_time - - print(f"[{label}] Memory search completed in {search_time:.4f} seconds") - print(f"[{label}] Found {len(results)} memories") - - if results: - print(f"[{label}] Top result distance: {results[0]['distance']:.6f} (lower = more similar)") - preview = results[0]['context'][:100] + "..." if len(results[0]['context']) > 100 else results[0]['context'] - print(f"[{label}] Top result preview: {preview}") - - return search_time - except Exception as e: - print(f"[{label}] Memory search failed: {str(e)}") - return None -``` - -### Test 1: Baseline Performance (No GSI Index) - -Test pure memory search performance without GSI optimization. - - -```python -# Test baseline memory search performance without GSI index -test_query = "What did Guardiola say about Manchester City?" -print("Testing baseline memory search performance without GSI optimization...") -baseline_time = test_memory_search_performance(storage, test_query, "Baseline Search") -print(f"\nBaseline memory search time (without GSI): {baseline_time:.4f} seconds\n") -``` - - Testing baseline memory search performance without GSI optimization... - - [Baseline Search] Testing memory search performance - [Baseline Search] Query: 'What did Guardiola say about Manchester City?' - [Baseline Search] Memory search completed in 0.6159 seconds - [Baseline Search] Found 1 memories - [Baseline Search] Top result distance: 0.340130 (lower = more similar) - [Baseline Search] Top result preview: Pep Guardiola praised Manchester City's current form, saying 'The team is playing well, we are in a ... - - Baseline memory search time (without GSI): 0.6159 seconds - - - -### Create BHIVE GSI Index - -Now let's create a BHIVE GSI vector index to enable high-performance memory searches. The index creation is done programmatically through the vector store. - - -```python -# Create GSI BHIVE vector index for optimal performance -print("Creating BHIVE GSI vector index...") -try: - storage.vector_store.create_index( - index_type=IndexType.BHIVE, - # index_type=IndexType.COMPOSITE, # Uncomment this line to create a COMPOSITE index instead - index_name=storage.index_name, - index_description="IVF,SQ8" # Auto-selected centroids with 8-bit scalar quantization - ) - print(f"GSI Vector index created successfully: {storage.index_name}") - - # Wait for index to become available - print("Waiting for index to become available...") - time.sleep(5) - -except Exception as e: - if "already exists" in str(e).lower(): - print(f"GSI vector index '{storage.index_name}' already exists, proceeding...") - else: - print(f"Error creating GSI index: {str(e)}") -``` - - Creating BHIVE GSI vector index... - GSI Vector index created successfully: vector_search_crew - Waiting for index to become available... - - -### Alternative: Composite Index Configuration - -If your agent memory use case requires complex filtering with scalar attributes, you can create a **Composite index** instead by changing the configuration above: - -```python -# Alternative: Create a Composite index for filtered memory searches -storage.vector_store.create_index( - index_type=IndexType.COMPOSITE, # Instead of IndexType.BHIVE - index_name=storage.index_name, - index_description="IVF,SQ8" # Same quantization settings -) -``` - -### Test 2: GSI-Optimized Performance - -Test the same memory search with BHIVE GSI optimization. - - -```python -# Test memory search performance with GSI index -print("Testing memory search performance with BHIVE GSI optimization...") -gsi_time = test_memory_search_performance(storage, test_query, "GSI-Optimized Search") -``` - - Testing memory search performance with BHIVE GSI optimization... - - [GSI-Optimized Search] Testing memory search performance - [GSI-Optimized Search] Query: 'What did Guardiola say about Manchester City?' - [GSI-Optimized Search] Memory search completed in 0.5910 seconds - [GSI-Optimized Search] Found 1 memories - [GSI-Optimized Search] Top result distance: 0.340142 (lower = more similar) - [GSI-Optimized Search] Top result preview: Pep Guardiola praised Manchester City's current form, saying 'The team is playing well, we are in a ... - - -### Test 3: Cache Benefits Testing - -Now let's demonstrate how caching can improve performance for repeated queries. **Note**: Caching benefits apply to both baseline and GSI-optimized searches. - - -```python -# Test cache benefits with a different query to avoid interference -cache_test_query = "How is Manchester City performing in training sessions?" - -print("Testing cache benefits with memory search...") -print("First execution (cache miss):") -cache_time_1 = test_memory_search_performance(storage, cache_test_query, "Cache Test - First Run") - -print("\nSecond execution (cache hit - should be faster):") -cache_time_2 = test_memory_search_performance(storage, cache_test_query, "Cache Test - Second Run") -``` - - Testing cache benefits with memory search... - First execution (cache miss): - - [Cache Test - First Run] Testing memory search performance - [Cache Test - First Run] Query: 'How is Manchester City performing in training sessions?' - [Cache Test - First Run] Memory search completed in 0.6076 seconds - [Cache Test - First Run] Found 1 memories - [Cache Test - First Run] Top result distance: 0.379242 (lower = more similar) - [Cache Test - First Run] Top result preview: Pep Guardiola praised Manchester City's current form, saying 'The team is playing well, we are in a ... - - Second execution (cache hit - should be faster): - - [Cache Test - Second Run] Testing memory search performance - [Cache Test - Second Run] Query: 'How is Manchester City performing in training sessions?' - [Cache Test - Second Run] Memory search completed in 0.4745 seconds - [Cache Test - Second Run] Found 1 memories - [Cache Test - Second Run] Top result distance: 0.379200 (lower = more similar) - [Cache Test - Second Run] Top result preview: Pep Guardiola praised Manchester City's current form, saying 'The team is playing well, we are in a ... - - -### Memory Search Performance Analysis - -Let's analyze the memory search performance improvements across all optimization levels: - - -```python -print("\n" + "="*80) -print("MEMORY SEARCH PERFORMANCE OPTIMIZATION SUMMARY") -print("="*80) - -print(f"Phase 1 - Baseline Search (No GSI): {baseline_time:.4f} seconds") -print(f"Phase 2 - GSI-Optimized Search: {gsi_time:.4f} seconds") -if cache_time_1 and cache_time_2: - print(f"Phase 3 - Cache Benefits:") - print(f" First execution (cache miss): {cache_time_1:.4f} seconds") - print(f" Second execution (cache hit): {cache_time_2:.4f} seconds") - -print("\n" + "-"*80) -print("MEMORY SEARCH OPTIMIZATION IMPACT:") -print("-"*80) - -# GSI improvement analysis -if baseline_time and gsi_time: - speedup = baseline_time / gsi_time if gsi_time > 0 else float('inf') - time_saved = baseline_time - gsi_time - percent_improvement = (time_saved / baseline_time) * 100 - print(f"GSI Index Benefit: {speedup:.2f}x faster ({percent_improvement:.1f}% improvement)") - -# Cache improvement analysis -if cache_time_1 and cache_time_2 and cache_time_2 < cache_time_1: - cache_speedup = cache_time_1 / cache_time_2 - cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100 - print(f"Cache Benefit: {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)") -else: - print(f"Cache Benefit: Variable (depends on query complexity and caching mechanism)") - -print(f"\nKey Insights for Agent Memory Performance:") -print(f"• GSI BHIVE indexes provide significant performance improvements for memory search") -print(f"• Performance gains are most dramatic for complex semantic memory queries") -print(f"• BHIVE optimization is particularly effective for agent conversational memory") -print(f"• Combined with proper quantization (SQ8), GSI delivers production-ready performance") -print(f"• These performance improvements directly benefit agent response times and scalability") -``` - - - ================================================================================ - MEMORY SEARCH PERFORMANCE OPTIMIZATION SUMMARY - ================================================================================ - Phase 1 - Baseline Search (No GSI): 0.6159 seconds - Phase 2 - GSI-Optimized Search: 0.5910 seconds - Phase 3 - Cache Benefits: - First execution (cache miss): 0.6076 seconds - Second execution (cache hit): 0.4745 seconds - - -------------------------------------------------------------------------------- - MEMORY SEARCH OPTIMIZATION IMPACT: - -------------------------------------------------------------------------------- - GSI Index Benefit: 1.04x faster (4.0% improvement) - Cache Benefit: 1.28x faster (21.9% improvement) - - Key Insights for Agent Memory Performance: - • GSI BHIVE indexes provide significant performance improvements for memory search - • Performance gains are most dramatic for complex semantic memory queries - • BHIVE optimization is particularly effective for agent conversational memory - • Combined with proper quantization (SQ8), GSI delivers production-ready performance - • These performance improvements directly benefit agent response times and scalability - - -**Note on BHIVE GSI Performance:** The BHIVE GSI index may show slower performance for very small datasets (few documents) due to the additional overhead of maintaining the index structure. However, as the dataset scales up, the BHIVE GSI index becomes significantly faster than traditional vector searches. The initial overhead investment pays off dramatically with larger memory stores, making it essential for production agent deployments with substantial conversational history. - -## CrewAI Agent Memory Demo - -### What is CrewAI Agent Memory? - -Now that we've optimized our memory search performance, let's demonstrate how CrewAI agents can leverage this GSI-optimized memory system. CrewAI agent memory enables: - -- **Persistent Context**: Agents remember information across conversations and tasks -- **Semantic Recall**: Agents can find relevant memories using natural language queries -- **Collaborative Memory**: Multiple agents can share and build upon each other's memories -- **Performance Benefits**: Our GSI optimizations directly improve agent memory retrieval speed - -This demo shows how the memory performance improvements we validated translate to real agent workflows. - -### Create Agents with Optimized Memory - -Set up CrewAI agents that use our GSI-optimized Couchbase memory storage for fast, contextual memory retrieval. - - -```python -# Initialize ShortTermMemory with our storage -memory = ShortTermMemory(storage=storage) - -# Initialize language model -llm = ChatOpenAI( - model="gpt-4o", - temperature=0.7 -) - -# Create agents with memory -sports_analyst = Agent( - role='Sports Analyst', - goal='Analyze Manchester City performance', - backstory='Expert at analyzing football teams and providing insights on their performance', - llm=llm, - memory=True, - memory_storage=memory -) - -journalist = Agent( - role='Sports Journalist', - goal='Create engaging football articles', - backstory='Experienced sports journalist who specializes in Premier League coverage', - llm=llm, - memory=True, - memory_storage=memory -) - -# Create tasks -analysis_task = Task( - description='Analyze Manchester City\'s recent performance based on Pep Guardiola\'s comments: "The team is playing well, we are in a good moment. The way we are training, the way we are playing - I am really pleased."', - agent=sports_analyst, - expected_output="A comprehensive analysis of Manchester City's current form based on Guardiola's comments." -) - -writing_task = Task( - description='Write a sports article about Manchester City\'s form using the analysis and Guardiola\'s comments.', - agent=journalist, - context=[analysis_task], - expected_output="An engaging sports article about Manchester City's current form and Guardiola's perspective." -) - -# Create crew with memory -crew = Crew( - agents=[sports_analyst, journalist], - tasks=[analysis_task, writing_task], - process=Process.sequential, - memory=True, - short_term_memory=memory, # Explicitly pass our memory implementation - verbose=True -) -``` - -### Run Agent Memory Demo - - -```python -# Run the crew with optimized GSI memory -print("Running CrewAI agents with GSI-optimized memory storage...") -start_time = time.time() -result = crew.kickoff() -execution_time = time.time() - start_time - -print("\n" + "="*80) -print("CREWAI AGENT MEMORY DEMO RESULT") -print("="*80) -print(result) -print("="*80) -print(f"\n✅ CrewAI agents completed successfully in {execution_time:.2f} seconds!") -print("✅ Agents used GSI-optimized Couchbase memory storage for fast retrieval!") -print("✅ Memory will persist across sessions for continued learning and context retention!") -``` - - Running CrewAI agents with GSI-optimized memory storage... - - - -
╭──────────────────────────────────────────── Crew Execution Started ─────────────────────────────────────────────╮
-                                                                                                                 
-  Crew Execution Started                                                                                         
-  Name: crew                                                                                                     
-  ID: 38d8c744-17cf-4aef-b246-3ff3a930ca29                                                                       
-  Tool Args:                                                                                                     
-                                                                                                                 
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -

-
-
-
-
-
╭────────────────────────────────────────────── 🧠 Retrieved Memory ──────────────────────────────────────────────╮
-                                                                                                                 
-  Historical Data:                                                                                               
-  - Ensure that the actual output directly addresses the task description and expected output.                   
-  - Include more specific statistical data and recent match examples to support the analysis.                    
-  - Incorporate more direct quotes from Pep Guardiola or other relevant stakeholders.                            
-  - Address potential biases in Guardiola's comments and provide a balanced view considering external opinions.  
-  - Explore deeper tactical analysis to provide more insights into the team's performance.                       
-  - Mention fu...                                                                                                
-                                                                                                                 
-╰─────────────────────────────────────────── Retrieval Time: 1503.80ms ───────────────────────────────────────────╯
-
- - - - -
-
- - - - -
╭─────────────────────────────────────────────── 🤖 Agent Started ────────────────────────────────────────────────╮
-                                                                                                                 
-  Agent: Sports Analyst                                                                                          
-                                                                                                                 
-  Task: Analyze Manchester City's recent performance based on Pep Guardiola's comments: "The team is playing     
-  well, we are in a good moment. The way we are training, the way we are playing - I am really pleased."         
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -

-
-
-
-
-
-
- - - - -

-
-
-
-
-
╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮
-                                                                                                                 
-  Task Completed                                                                                                 
-  Name: bd1a6f7d-9d37-47f0-98ce-2420c3175312                                                                     
-  Agent: Sports Analyst                                                                                          
-  Tool Args:                                                                                                     
-                                                                                                                 
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -

-
-
-
-
-
╭────────────────────────────────────────────── 🧠 Retrieved Memory ──────────────────────────────────────────────╮
-                                                                                                                 
-  Historical Data:                                                                                               
-  - Ensure that the article includes direct quotes from Guardiola if possible to enhance credibility.            
-  - Include more detailed statistical analysis or comparisons with previous seasons for a deeper insight into    
-  the team's form.                                                                                               
-  - Incorporate players' and experts' opinions or commentary to provide a well-rounded perspective.              
-  - Add a section discussing future challenges or key upcoming matches for Manchester City.                      
-  - Consider incorporating multimedia elements like images or videos ...                                         
-                                                                                                                 
-╰─────────────────────────────────────────── Retrieval Time: 854.27ms ────────────────────────────────────────────╯
-
- - - - -
-
- - - - -
╭─────────────────────────────────────────────── 🤖 Agent Started ────────────────────────────────────────────────╮
-                                                                                                                 
-  Agent: Sports Journalist                                                                                       
-                                                                                                                 
-  Task: Write a sports article about Manchester City's form using the analysis and Guardiola's comments.         
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -

-
-
-
-
-
-
- - - - -

-
-
-
-
-
╭──────────────────────────────────────────────── Task Completion ────────────────────────────────────────────────╮
-                                                                                                                 
-  Task Completed                                                                                                 
-  Name: 8bcffe0e-5a64-4e12-8207-e0f8701d847b                                                                     
-  Agent: Sports Journalist                                                                                       
-  Tool Args:                                                                                                     
-                                                                                                                 
-                                                                                                                 
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
-
- - - - -
-
- - - - - ================================================================================ - CREWAI AGENT MEMORY DEMO RESULT - ================================================================================ - **Manchester City’s Impeccable Form: A Reflection of Guardiola’s Philosophy** - - Manchester City has been turning heads with their exceptional form under the astute guidance of Pep Guardiola. The team’s recent performances have not only aligned seamlessly with their manager’s philosophy but have also placed them in a formidable position across various competitions. Guardiola himself expressed his satisfaction, stating, "The team is playing well, we are in a good moment. The way we are training, the way we are playing - I am really pleased." - - City’s prowess has been evident both domestically and in international arenas. A key factor in their success is their meticulous training regimen, which has fostered strategic flexibility, a hallmark of Guardiola’s management. Over the past few matches, Manchester City has consistently maintained a high possession rate, often exceeding 60%. This high possession allows them to control the tempo and dictate the flow of the game, a crucial component of their strategy. - - A recent standout performance was their dominant victory against a top Premier League rival. In this match, City showcased their attacking capabilities and defensive solidity, managing to keep a clean sheet. The contributions of key players like Kevin De Bruyne and Erling Haaland have been instrumental. De Bruyne’s creativity and passing range have opened multiple avenues for attack, while Haaland’s clinical finishing has consistently troubled defenses. - - Guardiola’s system, which relies heavily on positional play and fluid movement, has been a critical factor in their ability to break down opposition defenses with quick, incisive passes. The team’s pressing game has also been a cornerstone of their strategy, allowing them to win back possession high up the pitch and quickly transition to attack. - - Despite the glowing form and Guardiola’s positive outlook, it’s important to acknowledge potential areas for improvement. While their attack is formidable, City has shown occasional vulnerability to counter-attacks, particularly when their full-backs are positioned high up the field. Addressing these defensive transitions will be crucial, especially against teams with quick counter-attacking capabilities. - - Looking forward, Manchester City’s current form is a strong foundation for upcoming challenges, including key fixtures in the Premier League and the knockout stages of the UEFA Champions League. Maintaining this performance level will be essential as they pursue multiple titles. The team’s depth, strategic versatility, and Guardiola’s leadership will be decisive factors in sustaining their momentum. - - In conclusion, Manchester City is indeed in a "good moment," as Guardiola aptly puts it. Their recent performances reflect a well-oiled machine operating at high efficiency. However, the team must remain vigilant about potential weaknesses and continue adapting tactically to ensure their current form translates into long-term success. As they aim for glory, the synergy between Guardiola’s strategic mastermind and the players’ execution will undoubtedly be the key to their triumphs. - ================================================================================ - - ✅ CrewAI agents completed successfully in 37.60 seconds! - ✅ Agents used GSI-optimized Couchbase memory storage for fast retrieval! - ✅ Memory will persist across sessions for continued learning and context retention! - - -## Memory Retention Testing - -### Verify Memory Storage and Retrieval - -Test that our agents successfully stored memories and can retrieve them using semantic search. - - -```python -# Wait for memories to be stored -time.sleep(2) - -# List all documents in the collection -try: - # Query to fetch all documents of this memory type - query_str = f"SELECT META().id, * FROM `{storage.bucket_name}`.`{storage.scope_name}`.`{storage.collection_name}` WHERE memory_type = $type" - query_result = storage.cluster.query(query_str, type=storage.type) - - print(f"\nAll memory entries in Couchbase:") - print("-" * 80) - for i, row in enumerate(query_result, 1): - doc_id = row.get('id') - memory_id = row.get(storage.collection_name, {}).get('memory_id', 'unknown') - content = row.get(storage.collection_name, {}).get('text', '')[:100] + "..." # Truncate for readability - - print(f"Entry {i}: {memory_id}") - print(f"Content: {content}") - print("-" * 80) -except Exception as e: - print(f"Failed to list memory entries: {str(e)}") - -# Test memory retention -memory_query = "What is Manchester City's current form according to Guardiola?" -memory_results = storage.search( - query=memory_query, - limit=5, # Increased to see more results - score_threshold=0.0 # Lower threshold to see all results -) - -print("\nMemory Search Results:") -print("-" * 80) -for result in memory_results: - print(f"Context: {result['context']}") - print(f"Distance: {result['distance']} (lower = more similar)") - print("-" * 80) - -# Try a more specific query to find agent interactions -interaction_query = "Manchester City playing style analysis tactical" -interaction_results = storage.search( - query=interaction_query, - limit=3, - score_threshold=0.0 -) - -print("\nAgent Interaction Memory Results:") -print("-" * 80) -if interaction_results: - for result in interaction_results: - print(f"Context: {result['context'][:200]}...") # Limit output size - print(f"Distance: {result['distance']} (lower = more similar)") - print("-" * 80) -else: - print("No interaction memories found. This is normal if agents haven't completed tasks yet.") - print("-" * 80) -``` - - - All memory entries in Couchbase: - -------------------------------------------------------------------------------- - - Memory Search Results: - -------------------------------------------------------------------------------- - Context: Pep Guardiola praised Manchester City's current form, saying 'The team is playing well, we are in a good moment. The way we are training, the way we are playing - I am really pleased.' - Distance: 0.285379886892123 (lower = more similar) - -------------------------------------------------------------------------------- - Context: Manchester City's recent performance analysis under Pep Guardiola reflects a team in strong form and alignment with the manager's philosophy. Guardiola's comments, "The team is playing well, we are in a good moment. The way we are training, the way we are playing - I am really pleased," suggest a high level of satisfaction with both the tactical execution and the overall team ethos on the pitch. - - In recent matches, Manchester City has demonstrated their prowess in both domestic and international competitions. This form can be attributed to their meticulous training regimen and strategic flexibility, hallmarks of Guardiola's management style. Over the past few matches, City has maintained a high possession rate, often exceeding 60%, which allows them to control the tempo and dictate the flow of the game. Their attacking prowess is underscored by their goal-scoring statistics, often leading the league in goals scored per match. - - One standout example of their performance is their recent dominant victory against a top Premier League rival, where they not only showcased their attacking capabilities but also their defensive solidity, keeping a clean sheet. Key players such as Kevin De Bruyne and Erling Haaland have been instrumental, with De Bruyne's creativity and passing range creating numerous opportunities, while Haaland's clinical finishing has consistently troubled defenses. - - Guardiola's system relies heavily on positional play and fluid movement, which has been evident in the team's ability to break down opposition defenses through quick, incisive passes. The team's pressing game has also been a critical component, often winning back possession high up the pitch and quickly transitioning to attack. - - Despite Guardiola's positive outlook, potential biases in his comments might overlook some areas needing improvement. For instance, while their attack is formidable, there have been instances where the team has shown vulnerability to counter-attacks, particularly when full-backs are pushed high up the field. Addressing these defensive transitions could be crucial, especially against teams with quick, counter-attacking capabilities. - - Looking ahead, Manchester City's current form sets a strong foundation for upcoming challenges, including key fixtures in the Premier League and the knockout stages of the UEFA Champions League. Maintaining this level of performance will be critical as they pursue multiple titles. The team's depth, strategic versatility, and Guardiola's leadership are likely to be decisive factors in sustaining their momentum. - - In summary, Manchester City is indeed in a "good moment," as Guardiola states, with their recent performances reflecting a well-oiled machine operating at high efficiency. However, keeping a vigilant eye on potential weaknesses and continuing to adapt tactically will be essential to translating their current form into long-term success. - Distance: 0.22963345721993045 (lower = more similar) - -------------------------------------------------------------------------------- - Context: **Manchester City’s Impeccable Form: A Reflection of Guardiola’s Philosophy** - - ... (output truncated for brevity) - - -## Conclusion - -You've successfully implemented a custom memory backend for CrewAI agents using Couchbase GSI vector search! diff --git a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_AzureOpenAI.md b/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_AzureOpenAI.md deleted file mode 100644 index 421fde3..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_AzureOpenAI.md +++ /dev/null @@ -1,761 +0,0 @@ ---- -# frontmatter -path: "/tutorial-azure-openai-couchbase-rag-with-global-secondary-index" -title: Retrieval-Augmented Generation (RAG) with Couchbase and Azure OpenAI using GSI index -short_title: RAG with Couchbase and Azure OpenAI using GSI index -description: - - Learn how to build a semantic search engine using Couchbase and Azure OpenAI using GSI. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Azure OpenAI embeddings. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - OpenAI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/azure/gsi/RAG_with_Couchbase_and_AzureOpenAI.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [AzureOpenAI](https://azure.microsoft.com/) as the AI-powered embedding and language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI( Global Secondary Index) from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-azure-openai-couchbase-rag-with-fts/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/azure/RAG_with_Couchbase_and_AzureOpenAI.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for Azure OpenAI - -Please follow the [instructions](https://learn.microsoft.com/en-us/azure/ai-services/openai/reference) to generate the Azure OpenAI credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -Note: To run this this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI vector search is supported only from version 8.0 - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and AzureOpenAI provides advanced AI models for generating embeddings and understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - -```python -!pip install --quiet datasets==3.5.0 langchain-couchbase==0.5.0 langchain-openai==0.3.32 -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import sys -import os -import time -from datetime import timedelta -from uuid import uuid4 - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import ( - CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException, -) -from couchbase.options import ClusterOptions -from datasets import load_dataset -from langchain_core.documents import Document -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts.chat import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings -from langchain_couchbase.vectorstores import IndexType -from tqdm import tqdm -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -# Suppress verbose HTTP request logging -logging.getLogger("httpx").setLevel(logging.WARNING) -logging.getLogger("openai").setLevel(logging.WARNING) -logging.getLogger("urllib3").setLevel(logging.WARNING) -logging.getLogger("azure").setLevel(logging.WARNING) -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -AZURE_OPENAI_KEY = os.getenv('AZURE_OPENAI_KEY') or getpass.getpass('Enter your Azure OpenAI Key: ') -AZURE_OPENAI_ENDPOINT = os.getenv('AZURE_OPENAI_ENDPOINT') or input('Enter your Azure OpenAI Endpoint: ') -AZURE_OPENAI_EMBEDDING_DEPLOYMENT = os.getenv('AZURE_OPENAI_EMBEDDING_DEPLOYMENT') or input('Enter your Azure OpenAI Embedding Deployment: ') -AZURE_OPENAI_CHAT_DEPLOYMENT = os.getenv('AZURE_OPENAI_CHAT_DEPLOYMENT') or input('Enter your Azure OpenAI Chat Deployment: ') - -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: azure): ') or 'azure' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache' - -# Check if the variables are correctly loaded -if not all([AZURE_OPENAI_KEY, AZURE_OPENAI_ENDPOINT, AZURE_OPENAI_EMBEDDING_DEPLOYMENT, AZURE_OPENAI_CHAT_DEPLOYMENT]): - raise ValueError("Missing required Azure OpenAI variables") -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-09-22 12:23:15,245 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) -``` - - 2025-09-22 12:23:20,911 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-09-22 12:23:20,927 - INFO - Collection 'azure' already exists. Skipping creation. - - - 2025-09-22 12:23:23,264 - INFO - All documents cleared from the collection. - 2025-09-22 12:23:23,265 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-09-22 12:23:23,280 - INFO - Collection 'cache' already exists. Skipping creation. - 2025-09-22 12:23:25,419 - INFO - All documents cleared from the collection. - - - - - - - - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-09-22 12:23:43,453 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -# Creating AzureOpenAI Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using AzureOpenAI, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - - - -```python -try: - embeddings = AzureOpenAIEmbeddings( - deployment=AZURE_OPENAI_EMBEDDING_DEPLOYMENT, - openai_api_key=AZURE_OPENAI_KEY, - azure_endpoint=AZURE_OPENAI_ENDPOINT - ) - logging.info("Successfully created AzureOpenAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating AzureOpenAIEmbeddings: {str(e)}") -``` - - 2025-09-22 12:23:51,333 - INFO - Successfully created AzureOpenAIEmbeddings - - -# Setting Up the Couchbase Query Vector Store -A vector store is where we'll keep our embeddings. The query vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, GSI converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables us to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - -The vector store requires a distance metric to determine how similarity between vectors is calculated. This is crucial for accurate semantic search results as different distance metrics can yield different similarity rankings. Some of the supported Distance strategies are dot, l2, euclidean, cosine, l2_squared, euclidean_squared. In our implementation we will use cosine which is particularly effective for text embeddings. - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding = embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - - 2025-09-22 12:24:25,546 - INFO - Successfully created vector store - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of 50 articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Progress Tracking: Easier to monitor and track the ingestion progress -3. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - -```python -batch_size = 50 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") -``` - - 2025-09-22 12:36:18,756 - INFO - Document ingestion completed successfully. - - -# Using the AzureChatOpenAI Language Model (LLM) -Language models are AI systems that are trained to understand and generate human language. We'll be using `AzureChatOpenAI` language model to process user queries and generate meaningful responses. This model is a key component of our semantic search engine, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By creating this language model, we equip our search engine with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our search engine truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - - - - -```python -try: - llm = AzureChatOpenAI( - deployment_name=AZURE_OPENAI_CHAT_DEPLOYMENT, - openai_api_key=AZURE_OPENAI_KEY, - azure_endpoint=AZURE_OPENAI_ENDPOINT, - openai_api_version="2024-10-21" - ) - logging.info("Successfully created Azure OpenAI Chat model") -except Exception as e: - raise ValueError(f"Error creating Azure OpenAI Chat model: {str(e)}") -``` - - 2025-09-22 12:39:45,695 - INFO - Successfully created Azure OpenAI Chat model - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself. - -In the provided code, the search process begins by recording the start time, followed by executing the `similarity_search_with_score` method of the `CouchbaseQueryVectorStore`. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and the distance that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-22 12:41:51,036 - INFO - Semantic search completed in 2.55 seconds - - - - Semantic Search Results (completed in 2.55 seconds): - Distance: 0.3697, Text: The Littler effect - how darts hit the bullseye - - Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson. - - One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers. - - Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship. - - As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful." - - Littler beat Luke Humphries to win the Premier League title in May - - The number of academies for children under the age of 16 has doubled in the last year, says Junior Darts Corporation chairman Steve Brown. There are 115 dedicated groups offering youngsters equipment, tournaments and a place to develop, with bases including Australia, Bulgaria, Greece, Norway, USA and Mongolia. "We've seen so many inquiries from around the world, it's been such a boom. It took us 14 years to get 1,600 members and within 12 months we have over 3,000, and waiting lists," said Brown. "When I played darts as a child, I was quite embarrassed to tell my friends what my hobby was. All these kids playing darts now are pretty popular at school. It's a bit rock 'n roll and recognised as a cool thing to do." Plans are being hatched to extend the World Championship by four days and increase the number of players from 96 to 128. That will boost the number of tickets available by 25,000 to 115,000 but Hearn reckons he could sell three times as many. He says Saudi Arabia wants to host a tournament, which is likely to happen if no-alcohol regulations are relaxed. "They will change their rules in the next 12 months probably for certain areas having alcohol, and we'll take darts there and have a party in Saudi," he said. "When I got involved in darts, the total prize money was something like £300,000 for the year. This year it will go to £20m. I expect in five years' time, we'll be playing for £40m." - - Former electrician Cross charged to the 2018 world title in his first full season, while Adrian Lewis and Michael van Gerwen were multiple victors in their 20s and 16-time champion Phil ‘The Power’ Taylor is widely considered the greatest of all time. Littler is currently fourth in the world rankings, although that is based on a two-year Order of Merit. There have been suggestions from others the spotlight on the teenager means world number one Humphries, 29, has been denied the coverage he deserves, but no darts player has made a mark at such a young age as Littler. "Luke Humphries is another fabulous player who is going to be around for years. Sport is a very brutal world. It is about winning and claiming the high ground. There will be envy around," Hearn said. "Luke Littler is the next Tiger Woods for darts so they better get used to it, and the only way to compete is to get better." World number 38 Martin Lukeman was awestruck as he described facing a peak Littler after being crushed 16-3 in the Grand Slam final, with the teenager winning 15 consecutive legs. "I can't compete with that, it was like Godly. He was relentless, he is so good it's ridiculous," he said. Lukeman can still see the benefits he brings, adding: "What he's done for the sport is brilliant. If it wasn't for him, our wages wouldn't be going up. There's more sponsors, more money coming in, all good." Hearn feels future competition may come from players even younger than Littler. "I watched a 10-year-old a few months ago who averaged 104.89 and checked out a 4-3 win with a 136 finish. They smell the money, the fame and put the hard work in," he said. How much better Littler can get is guesswork, although Plummer believes he wants to reach new heights. "He never says 'how good was I?' But I think he wants to break records and beat Phil Taylor's 16 World Championships and 16 World Matchplay titles," he said. "He's young enough to do it." A version of this article was originally published on 29 November. - • None Know a lot about Littler? Take our quiz - Distance: 0.3901, Text: Luke Littler has risen from 164th to fourth in the rankings in a year - - A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97. - - Littler was hugged by his parents after victory over Meikle - - Littler returned to Alexandra Palace to a boisterous reception from more than 3,000 spectators and delivered an astonishing display in the fourth set. He was on for a nine-darter after his opening two throws in both of the first two legs and completed the set in 32 darts - the minimum possible is 27. The teenager will next play after Christmas against European Championship winner Ritchie Edhouse, the 29th seed, or Ian White, and is seeded to meet Humphries in the semi-finals. Having entered last year's event ranked 164th, Littler is up to fourth in the world and will go to number two if he reaches the final again this time. He has won 10 titles in his debut professional year, including the Premier League and Grand Slam of Darts. After reaching the World Championship final as a debutant aged just 16, Littler's life has been transformed and interest in darts has rocketed. Google say he was the most searched-for athlete online in the UK during 2024. This Christmas, more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies and has prompted plans to expand the World Championship. Littler was named BBC Young Sports Personality of the Year on Tuesday and was runner-up to athlete Keely Hodgkinson for the main award. - - ... (output truncated for brevity) - - -# Optimizing Vector Search with Global Secondary Index (GSI) - -While the above semantic search using similarity_search_with_score works effectively, we can significantly improve query performance by leveraging Global Secondary Index (GSI) in Couchbase. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -Hyperscale Vector Indexes (BHIVE) -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -Composite Vector Indexes -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -Choosing the Right Index Type -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For more details, see the [Couchbase Vector Index documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -Centroids (IVF - Inverted File): -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -Quantization Options: -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -Common Examples: -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -vector_store.create_index(index_type=IndexType.BHIVE, index_name="azure_bhive_index",index_description="IVF,SQ8") -``` - -The example below shows running the same similarity search, but now using the BHIVE GSI index we created above. You'll notice improved performance as the index efficiently retrieves data. - -**Important**: When using Composite indexes, scalar filters take precedence over vector similarity, which can improve performance for filtered searches but may miss some semantically relevant results that don't match the scalar criteria. - -Note: In GSI vector search, the distance represents the vector distance between the query and document embeddings. Lower distance indicate higher similarity, while higher distance indicate lower similarity. - - -```python -query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) - - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-22 12:42:10,244 - INFO - Semantic search completed in 1.30 seconds - - - - Semantic Search Results (completed in 1.30 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3697, Text: The Littler effect - how darts hit the bullseye - - Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson. - - One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers. - - Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship. - - As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful." - - Littler beat Luke Humphries to win the Premier League title in May - - The number of academies for children under the age of 16 has doubled in the last year, says Junior Darts Corporation chairman Steve Brown. There are 115 dedicated groups offering youngsters equipment, tournaments and a place to develop, with bases including Australia, Bulgaria, Greece, Norway, USA and Mongolia. "We've seen so many inquiries from around the world, it's been such a boom. It took us 14 years to get 1,600 members and within 12 months we have over 3,000, and waiting lists," said Brown. "When I played darts as a child, I was quite embarrassed to tell my friends what my hobby was. All these kids playing darts now are pretty popular at school. It's a bit rock 'n roll and recognised as a cool thing to do." Plans are being hatched to extend the World Championship by four days and increase the number of players from 96 to 128. That will boost the number of tickets available by 25,000 to 115,000 but Hearn reckons he could sell three times as many. He says Saudi Arabia wants to host a tournament, which is likely to happen if no-alcohol regulations are relaxed. "They will change their rules in the next 12 months probably for certain areas having alcohol, and we'll take darts there and have a party in Saudi," he said. "When I got involved in darts, the total prize money was something like £300,000 for the year. This year it will go to £20m. I expect in five years' time, we'll be playing for £40m." - - Former electrician Cross charged to the 2018 world title in his first full season, while Adrian Lewis and Michael van Gerwen were multiple victors in their 20s and 16-time champion Phil ‘The Power’ Taylor is widely considered the greatest of all time. Littler is currently fourth in the world rankings, although that is based on a two-year Order of Merit. There have been suggestions from others the spotlight on the teenager means world number one Humphries, 29, has been denied the coverage he deserves, but no darts player has made a mark at such a young age as Littler. "Luke Humphries is another fabulous player who is going to be around for years. Sport is a very brutal world. It is about winning and claiming the high ground. There will be envy around," Hearn said. "Luke Littler is the next Tiger Woods for darts so they better get used to it, and the only way to compete is to get better." World number 38 Martin Lukeman was awestruck as he described facing a peak Littler after being crushed 16-3 in the Grand Slam final, with the teenager winning 15 consecutive legs. "I can't compete with that, it was like Godly. He was relentless, he is so good it's ridiculous," he said. Lukeman can still see the benefits he brings, adding: "What he's done for the sport is brilliant. If it wasn't for him, our wages wouldn't be going up. There's more sponsors, more money coming in, all good." Hearn feels future competition may come from players even younger than Littler. "I watched a 10-year-old a few months ago who averaged 104.89 and checked out a 4-3 win with a 136 finish. They smell the money, the fame and put the hard work in," he said. How much better Littler can get is guesswork, although Plummer believes he wants to reach new heights. "He never says 'how good was I?' But I think he wants to break records and beat Phil Taylor's 16 World Championships and 16 World Matchplay titles," he said. "He's young enough to do it." A version of this article was originally published on 29 November. - • None Know a lot about Littler? Take our quiz - -------------------------------------------------------------------------------- - Distance: 0.3901, Text: Luke Littler has risen from 164th to fourth in the rankings in a year - - A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97. - - Littler was hugged by his parents after victory over Meikle - - ... (output truncated for brevity) - - -Note: To create a COMPOSITE index, the below code can be used. -Choose based on your specific use case and query patterns. For this tutorial's question-answering scenario using the TREC dataset, either index type would work, but BHIVE might be more efficient for pure semantic search across questions. - - -```python -vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="azure_composite_index", index_description="IVF,SQ8") -``` - -# Setting Up a Couchbase Cache -To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-09-22 12:42:21,917 - INFO - Successfully created cache - - -# Retrieval-Augmented Generation (RAG) with Couchbase and Langchain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -# Create RAG prompt template -rag_prompt = ChatPromptTemplate.from_messages([ - ("system", "You are a helpful assistant that answers questions based on the provided context."), - ("human", "Context: {context}\n\nQuestion: {question}") -]) - -# Create RAG chain -rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | rag_prompt - | llm - | StrOutputParser() -) -logging.info("Successfully created RAG chain") -``` - - 2025-09-16 13:41:05,596 - INFO - Successfully created RAG chain - - - -```python -start_time = time.time() -# Turn off excessive Logging -logging.basicConfig(level=logging.WARNING, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -try: - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: In his recent PDC World Championship match, Luke Littler achieved several key milestones and records: - - 1. **Tournament Record Average**: Littler set a tournament record with a 140.91 set average during the fourth and final set of his second-round match against Ryan Meikle. - - 2. **Nine-Darter Attempt**: He came close to achieving a nine-darter but narrowly missed double 12. - - 3. **Dramatic Victory**: Littler defeated Meikle 3-1 in a match described as emotionally challenging for the 17-year-old. - - 4. **Fourth Set Dominance**: In the final set, Littler exploded into life, hitting four maximum 180s and winning three straight legs in 11, 10, and 11 darts. - - 5. **Overall Set Performance**: He completed the fourth set in 32 darts (the minimum possible is 27) and achieved a match average of 100.85. - - These achievements highlight Littler's exceptional talent and his continued rise in professional darts. - RAG response generated in 5.81 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What were Luke Littler's key achievements and records in his recent PDC World Championship match?", - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") - -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: In the Premier League match between Fulham and Liverpool, the game ended in a 2-2 draw at Anfield. Liverpool played the majority of the game with ten men after Andy Robertson was shown a red card in the 17th minute for denying Harry Wilson a goalscoring opportunity. Despite their numerical disadvantage, Liverpool demonstrated resilience and strong performance. - - Fulham took the lead twice during the match, but Liverpool managed to equalize on both occasions. Diogo Jota, returning from injury, scored the crucial 86th-minute equalizer for Liverpool. Even with 10 players, Liverpool maintained over 60% possession and led various attacking metrics, including shots, big chances, and touches in the opposition box. - - Fulham's left-back Antonee Robinson praised Liverpool’s performance, stating that it didn’t feel like they had 10 men on the field due to their attacking risks and relentless pressure. Liverpool head coach Arne Slot called his team's performance "impressive" and lauded their character and fight in adversity. - Time taken: 6.69 seconds - - Query 2: What were Luke Littler's key achievements and records in his recent PDC World Championship match? - Response: In his recent PDC World Championship match, Luke Littler achieved several key milestones and records: - - 1. **Tournament Record Average**: Littler set a tournament record with a 140.91 set average during the fourth and final set of his second-round match against Ryan Meikle. - - 2. **Nine-Darter Attempt**: He came close to achieving a nine-darter but narrowly missed double 12. - - 3. **Dramatic Victory**: Littler defeated Meikle 3-1 in a match described as emotionally challenging for the 17-year-old. - - 4. **Fourth Set Dominance**: In the final set, Littler exploded into life, hitting four maximum 180s and winning three straight legs in 11, 10, and 11 darts. - - 5. **Overall Set Performance**: He completed the fourth set in 32 darts (the minimum possible is 27) and achieved a match average of 100.85. - - These achievements highlight Littler's exceptional talent and his continued rise in professional darts. - Time taken: 1.09 seconds - - - ... (output truncated for brevity) - - -By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and AzureOpenAI. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how it improves querying data more efficiently using GSI which can significantly improve your RAG performance. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Bedrock.md b/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Bedrock.md deleted file mode 100644 index e572e58..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Bedrock.md +++ /dev/null @@ -1,802 +0,0 @@ ---- -# frontmatter -path: "/tutorial-aws-bedrock-couchbase-rag-with-global-secondary-index" -title: Retrieval-Augmented Generation (RAG) with Couchbase and Amazon Bedrock using GSI index -short_title: RAG with Couchbase and Amazon Bedrock using GSI index -description: - - Learn how to build a semantic search engine using Couchbase and Amazon Bedrock using GSI. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Amazon Bedrock's Titan embeddings and Claude language model. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - Amazon Bedrock -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/awsbedrock/gsi/RAG_with_Couchbase_and_Bedrock.ipynb) - -# Introduction - -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Amazon Bedrock](https://aws.amazon.com/bedrock/) as both the embedding and language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI( Global Secondary Index) from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-aws-bedrock-couchbase-rag-with-fts/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/awsbedrock/RAG_with_Couchbase_and_Bedrock.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for AWS Bedrock -* Please follow the [instructions](https://docs.aws.amazon.com/bedrock/latest/userguide/getting-started.html) to set up AWS Bedrock and generate credentials. -* Ensure you have the necessary IAM permissions to access Bedrock services. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -Note: To run this this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI search is supported only from version 8.0 - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries - -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.5.0 langchain-aws boto3==1.37.35 python-dotenv==1.1.0 - -``` - - - [notice] A new release of pip is available: 24.3.1 -> 25.2 - [notice] To update, run: pip install --upgrade pip - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries - -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta - -import boto3 -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_aws import BedrockEmbeddings, ChatBedrock -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts.chat import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from tqdm import tqdm -``` - -# Setup Logging - -Logging is configured to track the progress of the script and capture any errors or warnings. - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like AWS credentials, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The project includes an `.env.sample` file that lists all the environment variables. To get started: - -1. Create a `.env` file in the same directory as this notebook -2. Copy the contents from `.env.sample` to your `.env` file -3. Fill in the required credentials - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python - -# Load environment variables from .env file if it exists -load_dotenv(override=True) - -# AWS Credentials -AWS_ACCESS_KEY_ID = os.getenv('AWS_ACCESS_KEY_ID') or input('Enter your AWS Access Key ID: ') -AWS_SECRET_ACCESS_KEY = os.getenv('AWS_SECRET_ACCESS_KEY') or getpass.getpass('Enter your AWS Secret Access Key: ') -AWS_REGION = os.getenv('AWS_REGION') or input('Enter your AWS region (default: us-east-1): ') or 'us-east-1' - -# Couchbase Settings -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: bedrock): ') or 'bedrock' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache' - -# Check if required credentials are set -for cred_name, cred_value in { - 'AWS_ACCESS_KEY_ID': AWS_ACCESS_KEY_ID, - 'AWS_SECRET_ACCESS_KEY': AWS_SECRET_ACCESS_KEY, - 'CB_HOST': CB_HOST, - 'CB_USERNAME': CB_USERNAME, - 'CB_PASSWORD': CB_PASSWORD, - 'CB_BUCKET_NAME': CB_BUCKET_NAME -}.items(): - if not cred_value: - raise ValueError(f"{cred_name} is not set") -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-09-02 12:21:07,348 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-08-29 13:03:42,591 - INFO - Bucket 'query-vector-search-testing' does not exist. Creating it... - 2025-08-29 13:03:44,657 - INFO - Bucket 'query-vector-search-testing' created successfully. - 2025-08-29 13:03:44,663 - INFO - Scope 'shared' does not exist. Creating it... - 2025-08-29 13:03:44,704 - INFO - Scope 'shared' created successfully. - 2025-08-29 13:03:44,714 - INFO - Collection 'bedrock' does not exist. Creating it... - 2025-08-29 13:03:44,770 - INFO - Collection 'bedrock' created successfully. - 2025-08-29 13:03:46,953 - INFO - All documents cleared from the collection. - 2025-08-29 13:03:46,954 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-08-29 13:03:46,969 - INFO - Collection 'cache' does not exist. Creating it... - 2025-08-29 13:03:47,025 - INFO - Collection 'cache' created successfully. - 2025-08-29 13:03:49,183 - INFO - All documents cleared from the collection. - - - - - - - - - -# Creating Amazon Bedrock Client and Embeddings - -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. We'll use Amazon Bedrock's Titan embedding model for embeddings. - -## Using Amazon Bedrock's Titan Model - -Language models are AI systems that are trained to understand and generate human language. We'll be using Amazon Bedrock's Titan model to process user queries and generate meaningful responses. The Titan model family includes both embedding models for converting text into vector representations and text generation models for producing human-like responses. - -Key features of Amazon Bedrock's Titan models: -- Titan Embeddings model for embedding vector generation -- Titan Text model for natural language understanding and generation -- Seamless integration with AWS infrastructure -- Enterprise-grade security and scalability - - -```python -try: - bedrock_client = boto3.client( - service_name='bedrock-runtime', - region_name=AWS_REGION, - aws_access_key_id=AWS_ACCESS_KEY_ID, - aws_secret_access_key=AWS_SECRET_ACCESS_KEY - ) - - embeddings = BedrockEmbeddings( - client=bedrock_client, - model_id="amazon.titan-embed-text-v2:0" - ) - logging.info("Successfully created Bedrock embeddings client") -except Exception as e: - raise ValueError(f"Error creating Bedrock embeddings client: {str(e)}") -``` - - 2025-09-02 12:21:15,663 - INFO - Successfully created Bedrock embeddings client - - -# Setting Up the Couchbase Query Vector Store -A vector store is where we'll keep our embeddings. The query vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, GSI converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables us to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - -The vector store requires a distance metric to determine how similarity between vectors is calculated. This is crucial for accurate semantic search results as different distance metrics can yield different similarity rankings. Some of the supported Distance strategies are dot, l2, euclidean, cosine, l2_squared, euclidean_squared. In our implementation we will use cosine which is particularly effective for text embeddings. - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding = embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - - 2025-09-02 12:22:15,979 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-09-02 12:21:31,880 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of 50 articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Error Handling: If an error occurs, only the current batch is affected -3. Progress Tracking: Easier to monitor and track the ingestion progress -4. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 50 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -``` - - 2025-08-20 14:05:53,302 - INFO - Document ingestion completed successfully. - - -# Setting Up a Couchbase Cache -To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-09-02 12:22:20,978 - INFO - Successfully created cache - - -# Using Amazon Bedrock's Titan Text Express v1 Model - -Amazon Bedrock's Titan Text Express v1 is a state-of-the-art foundation model designed for fast and efficient text generation tasks. This model excels at: - -- Text generation and completion -- Question answering -- Summarization -- Content rewriting -- Analysis and extraction - -Key features of Titan Text Express v1: - -- Optimized for low-latency responses while maintaining high quality output -- Supports up to 8K tokens context window -- Built-in content filtering and safety controls -- Cost-effective compared to larger models -- Seamlessly integrates with AWS services - -The model uses a temperature parameter (0-1) to control randomness in responses: -- Lower values (e.g. 0) produce more focused, deterministic outputs -- Higher values introduce more creativity and variation - -We'll be using this model through Amazon Bedrock's API to process user queries and generate contextually relevant responses based on our vector database content. - - -```python -try: - llm = ChatBedrock( - client=bedrock_client, - model_id="amazon.titan-text-express-v1", - model_kwargs={"temperature": 0} - ) - logging.info("Successfully created Bedrock LLM client") -except Exception as e: - logging.error(f"Error creating Bedrock LLM client: {str(e)}. Please check your AWS credentials and Bedrock access.") - raise -``` - - 2025-09-02 12:22:24,513 - INFO - Successfully created Bedrock LLM client - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself. - -In the provided code, the search process begins by recording the start time, followed by executing the `similarity_search_with_score` method of the `CouchbaseQueryVectorStore`. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and the distance that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) - - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-02 12:23:51,477 - INFO - Semantic search completed in 1.29 seconds - - - - Semantic Search Results (completed in 1.29 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3512, Text: Luke Littler has risen from 164th to fourth in the rankings in a year - - A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97. - - Littler was hugged by his parents after victory over Meikle - - Littler returned to Alexandra Palace to a boisterous reception from more than 3,000 spectators and delivered an astonishing display in the fourth set. He was on for a nine-darter after his opening two throws in both of the first two legs and completed the set in 32 darts - the minimum possible is 27. The teenager will next play after Christmas against European Championship winner Ritchie Edhouse, the 29th seed, or Ian White, and is seeded to meet Humphries in the semi-finals. Having entered last year's event ranked 164th, Littler is up to fourth in the world and will go to number two if he reaches the final again this time. He has won 10 titles in his debut professional year, including the Premier League and Grand Slam of Darts. After reaching the World Championship final as a debutant aged just 16, Littler's life has been transformed and interest in darts has rocketed. Google say he was the most searched-for athlete online in the UK during 2024. This Christmas, more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies and has prompted plans to expand the World Championship. Littler was named BBC Young Sports Personality of the Year on Tuesday and was runner-up to athlete Keely Hodgkinson for the main award. - - Nick Kenny will play world champion Luke Humphries in round three after Christmas - - Barneveld was shocked 3-1 by world number 76 Kenny, who was in tears after a famous victory. Kenny, 32, will face Humphries in round three after defeating the Dutchman, who won the BDO world title four times and the PDC crown in 2007. Van Barneveld, ranked 32nd, became the sixth seed to exit in the second round. His compatriot Noppert, the 13th seed, was stunned 3-1 by Joyce, who will face Ryan Searle or Matt Campbell next, with the winner of that tie potentially meeting Littler in the last 16. Elsewhere, 15th seed Chris Dobey booked his place in the third round with a 3-1 win over Alexander Merkx. Englishman Dobey concluded an afternoon session which started with a trio of 3-0 scorelines. Northern Ireland's Brendan Dolan beat Lok Yin Lee to set up a meeting with three-time champion Michael van Gerwen after Christmas. In the final two first-round matches of the 2025 competition, Wales' Rhys Griffin beat Karel Sedlacek of the Czech Republic before Asia number one Alexis Toylo cruised past Richard Veenstra. - -------------------------------------------------------------------------------- - Distance: 0.4124, Text: The Littler effect - how darts hit the bullseye - - Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson. - - One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers. - - Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship. - - As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful." - - - ... (output truncated for brevity) - - -# Optimizing Vector Search with Global Secondary Index (GSI) - -While the above semantic search using similarity_search_with_score works effectively, we can significantly improve query performance by leveraging Global Secondary Index (GSI) in Couchbase. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -Hyperscale Vector Indexes (BHIVE) -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -Composite Vector Indexes -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -Choosing the Right Index Type -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For more details, see the [Couchbase Vector Index documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -Centroids (IVF - Inverted File): -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -Quantization Options: -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -Common Examples: -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -from langchain_couchbase.vectorstores import IndexType -vector_store.create_index(index_type=IndexType.BHIVE, index_name="bedrock_bhive_index",index_description="IVF,SQ8") -``` - -The example below shows running the same similarity search, but now using the BHIVE GSI index we created above. You'll notice improved performance as the index efficiently retrieves data. - -**Important**: When using Composite indexes, scalar filters take precedence over vector similarity, which can improve performance for filtered searches but may miss some semantically relevant results that don't match the scalar criteria. - -Note: In GSI vector search, the distance represents the vector distance between the query and document embeddings. Lower distance indicate higher similarity, while higher distance indicate lower similarity. - - -```python - -query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) - - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-02 12:24:54,503 - INFO - Semantic search completed in 0.36 seconds - - - - Semantic Search Results (completed in 0.36 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3512, Text: Luke Littler has risen from 164th to fourth in the rankings in a year - - A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97. - - Littler was hugged by his parents after victory over Meikle - - Littler returned to Alexandra Palace to a boisterous reception from more than 3,000 spectators and delivered an astonishing display in the fourth set. He was on for a nine-darter after his opening two throws in both of the first two legs and completed the set in 32 darts - the minimum possible is 27. The teenager will next play after Christmas against European Championship winner Ritchie Edhouse, the 29th seed, or Ian White, and is seeded to meet Humphries in the semi-finals. Having entered last year's event ranked 164th, Littler is up to fourth in the world and will go to number two if he reaches the final again this time. He has won 10 titles in his debut professional year, including the Premier League and Grand Slam of Darts. After reaching the World Championship final as a debutant aged just 16, Littler's life has been transformed and interest in darts has rocketed. Google say he was the most searched-for athlete online in the UK during 2024. This Christmas, more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies and has prompted plans to expand the World Championship. Littler was named BBC Young Sports Personality of the Year on Tuesday and was runner-up to athlete Keely Hodgkinson for the main award. - - Nick Kenny will play world champion Luke Humphries in round three after Christmas - - Barneveld was shocked 3-1 by world number 76 Kenny, who was in tears after a famous victory. Kenny, 32, will face Humphries in round three after defeating the Dutchman, who won the BDO world title four times and the PDC crown in 2007. Van Barneveld, ranked 32nd, became the sixth seed to exit in the second round. His compatriot Noppert, the 13th seed, was stunned 3-1 by Joyce, who will face Ryan Searle or Matt Campbell next, with the winner of that tie potentially meeting Littler in the last 16. Elsewhere, 15th seed Chris Dobey booked his place in the third round with a 3-1 win over Alexander Merkx. Englishman Dobey concluded an afternoon session which started with a trio of 3-0 scorelines. Northern Ireland's Brendan Dolan beat Lok Yin Lee to set up a meeting with three-time champion Michael van Gerwen after Christmas. In the final two first-round matches of the 2025 competition, Wales' Rhys Griffin beat Karel Sedlacek of the Czech Republic before Asia number one Alexis Toylo cruised past Richard Veenstra. - -------------------------------------------------------------------------------- - Distance: 0.4124, Text: The Littler effect - how darts hit the bullseye - - Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson. - - One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers. - - Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship. - - As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful." - - - ... (output truncated for brevity) - - -Note: To create a COMPOSITE index, the below code can be used. -Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but BHIVE might be more efficient for pure semantic search across news articles. - - -```python -from langchain_couchbase.vectorstores import IndexType -vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="bedrock_composite_index", index_description="IVF,SQ8") -``` - -# Retrieval-Augmented Generation (RAG) with Couchbase and LangChain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -# Create RAG prompt template -rag_prompt = ChatPromptTemplate.from_messages([ - ("system", "You are a helpful assistant that answers questions based on the provided context."), - ("human", "Context: {context}\n\nQuestion: {question}") -]) - -# Create RAG chain -rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | rag_prompt - | llm - | StrOutputParser() -) -logging.info("Successfully created RAG chain") -``` - - 2025-09-02 12:25:08,521 - INFO - Successfully created RAG chain - - - -```python -start_time = time.time() -# Turn off excessive Logging -logging.basicConfig(level=logging.WARNING, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -try: - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: - Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end - RAG response generated in 0.41 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What were Luke Littler's key achievements and records in his recent PDC World Championship match?", - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") - -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: The match between Fullham and Liverpool ended in a 2-2 draw. - Time taken: 2.30 seconds - - Query 2: What were Luke Littler's key achievements and records in his recent PDC World Championship match? - Response: - Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end - Time taken: 0.40 seconds - - Query 3: What happened in the match between Fullham and Liverpool? - Response: The match between Fullham and Liverpool ended in a 2-2 draw. - Time taken: 0.36 seconds - - -## Conclusion -By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and AWS Bedrock. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how it improves querying data more efficiently using GSI which can significantly improve your RAG performance. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Claude(by_Anthropic).md b/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Claude(by_Anthropic).md deleted file mode 100644 index 6cb6f4c..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Claude(by_Anthropic).md +++ /dev/null @@ -1,756 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openai-claude-couchbase-rag-with-global-secondary-index" -title: Retrieval-Augmented Generation (RAG) with Couchbase, OpenAI, and Claude using GSI index -short_title: RAG with Couchbase, OpenAI, and Claude using GSI index -description: - - Learn how to build a semantic search engine using Couchbase, OpenAI embeddings, and Anthropic's Claude using GSI. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with OpenAI embeddings and use Claude as the language model. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - OpenAI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/claudeai/gsi/RAG_with_Couchbase_and_Claude(by_Anthropic).ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [OpenAI](https://openai.com/) as the AI-powered embedding and [Anthropic](https://claude.ai/) as the language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI( Global Secondary Index) from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-openai-claude-couchbase-rag-with-fts/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/claudeai/RAG_with_Couchbase_and_Claude(by_Anthropic).ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for OpenAI and Anthropic - -* Please follow the [instructions](https://platform.openai.com/docs/quickstart) to generate the OpenAI credentials. -* Please follow the [instructions](https://docs.anthropic.com/en/api/getting-started) to generate the Anthropic credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -Note: To run this this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI vector search is supported only from version 8.0 - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. - -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and OpenAI provides advanced AI models for generating embeddings and Claude(by Anthropic) for understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.5.0 langchain-anthropic==0.3.19 langchain-openai==0.3.32 python-dotenv==1.1.1 -``` - - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta -from multiprocessing import AuthenticationError - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException, - ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_anthropic import ChatAnthropic -from langchain_core.globals import set_llm_cache -from langchain_core.prompts.chat import (ChatPromptTemplate, - HumanMessagePromptTemplate, - SystemMessagePromptTemplate) -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_openai import OpenAIEmbeddings -from langchain_couchbase.vectorstores import IndexType -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -# Disable all logging except critical to prevent OpenAI API request logs -logging.getLogger("httpx").setLevel(logging.CRITICAL) -``` - -# Loading Sensitive Informnation -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv() - -# Load from environment variables or prompt for input in one-liners -ANTHROPIC_API_KEY = os.getenv('ANTHROPIC_API_KEY') or getpass.getpass('Enter your Anthropic API key: ') -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API key: ') -CB_HOST = os.getenv('CB_HOST', 'couchbase://localhost') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME', 'Administrator') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD', 'password') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME', 'query-vector-search-testing') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME', 'shared') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME', 'claude') or input('Enter your collection name (default: claude): ') or 'claude' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION', 'cache') or input('Enter your cache collection name (default: cache): ') or 'cache' -# Check if the variables are correctly loaded -if not ANTHROPIC_API_KEY: - raise ValueError("ANTHROPIC_API_KEY is not set in the environment.") -if not OPENAI_API_KEY: - raise ValueError("OPENAI_API_KEY is not set in the environment.") -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-09-09 12:15:22,899 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-09-09 12:15:26,795 - INFO - Bucket 'query-vector-search-testing' exists. - - - 2025-09-09 12:15:26,808 - INFO - Collection 'claude' does not exist. Creating it... - 2025-09-09 12:15:26,854 - INFO - Collection 'claude' created successfully. - 2025-09-09 12:15:29,065 - INFO - All documents cleared from the collection. - 2025-09-09 12:15:29,066 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-09-09 12:15:29,074 - INFO - Collection 'cache' already exists. Skipping creation. - 2025-09-09 12:15:31,115 - INFO - All documents cleared from the collection. - - - - - - - - - -# Creating OpenAI Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using OpenAI, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - - - -```python -try: - embeddings = OpenAIEmbeddings(openai_api_key=OPENAI_API_KEY, model='text-embedding-3-small') - logging.info("Successfully created OpenAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating OpenAIEmbeddings: {str(e)}") -``` - - 2025-09-09 12:15:54,388 - INFO - Successfully created OpenAIEmbeddings - - -# Setting Up the Couchbase Query Vector Store -A vector store is where we'll keep our embeddings. The query vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, GSI converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables us to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - -The vector store requires a distance metric to determine how similarity between vectors is calculated. This is crucial for accurate semantic search results as different distance metrics can yield different similarity rankings. Some of the supported Distance strategies are dot, l2, euclidean, cosine, l2_squared, euclidean_squared. In our implementation we will use cosine which is particularly effective for text embeddings. - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding = embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - - 2025-09-09 12:16:02,578 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-09-09 12:16:16,461 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Progress Tracking: Easier to monitor and track the ingestion progress -3. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 100 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 100 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -``` - - 2025-09-09 12:18:40,320 - INFO - Document ingestion completed successfully. - - -# Setting Up a Couchbase Cache -To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-09-09 12:18:47,269 - INFO - Successfully created cache - - -# Using the Claude 4 Sonnet Language Model (LLM) -Language models are AI systems that are trained to understand and generate human language. We'll be using the `Claude 4 Sonnet` language model to process user queries and generate meaningful responses. This model is a key component of our semantic search engine, allowing it to go beyond simple keyword matching and truly understand the intent behind a query. By creating this language model, we equip our search engine with the ability to interpret complex queries, understand the nuances of language, and provide more accurate and contextually relevant responses. - -The language model's ability to understand context and generate coherent responses is what makes our search engine truly intelligent. It can not only find the right information but also present it in a way that is useful and understandable to the user. - - - - -```python -try: - llm = ChatAnthropic(temperature=0.1, anthropic_api_key=ANTHROPIC_API_KEY, model_name='claude-sonnet-4-20250514') - logging.info("Successfully created ChatAnthropic") -except Exception as e: - logging.error(f"Error creating ChatAnthropic: {str(e)}. Please check your API key and network connection.") - raise -``` - - 2025-09-09 12:20:36,212 - INFO - Successfully created ChatAnthropic - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself. - -In the provided code, the search process begins by recording the start time, followed by executing the `similarity_search_with_score` method of the `CouchbaseQueryVectorStore`. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and the distance that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What happened with the map shown during the 2026 FIFA World Cup draw regarding Ukraine and Crimea? What was the controversy?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, score in search_results: - print(f"Score: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-09 12:21:34,292 - INFO - Semantic search completed in 1.91 seconds - - - - Semantic Search Results (completed in 1.91 seconds): - -------------------------------------------------------------------------------- - Score: 0.2502, Text: A map shown during the draw for the 2026 Fifa World Cup has been criticised by Ukraine as an "unacceptable error" after it appeared to exclude Crimea as part of the country. The graphic - showing countries that cannot be drawn to play each other for geopolitical reasons - highlighted Ukraine but did not include the peninsula that is internationally recognised to be part of it. Crimea has been under Russian occupation since 2014 and just a handful of countries recognise the peninsula as Russian territory. Ukraine Foreign Ministry spokesman Heorhiy Tykhy said that the nation expects "a public apology". Fifa said it was "aware of an issue" and the image had been removed. - - Writing on X, Tykhy said that Fifa had not only "acted against international law" but had also "supported Russian propaganda, war crimes, and the crime of aggression against Ukraine". He added a "fixed" version of the map to his post, highlighting Crimea as part of Ukraine's territory. Among the countries that cannot play each other are Ukraine and Belarus, Spain and Gibraltar and Kosovo versus either Bosnia and Herzegovina or Serbia. - - This Twitter post cannot be displayed in your browser. Please enable Javascript or try a different browser. View original content on Twitter The BBC is not responsible for the content of external sites. Skip twitter post by Heorhii Tykhyi This article contains content provided by Twitter. We ask for your permission before anything is loaded, as they may be using cookies and other technologies. You may want to read Twitter’s cookie policy, external and privacy policy, external before accepting. To view this content choose ‘accept and continue’. The BBC is not responsible for the content of external sites. - - The Ukrainian Football Association has also sent a letter to Fifa secretary-general Mathias Grafström and UEFA secretary-general Theodore Theodoridis over the matter. "We appeal to you to express our deep concern about the infographic map [shown] on December 13, 2024," the letter reads. "Taking into account a number of official decisions and resolutions adopted by the Fifa Council and the UEFA executive committee since 2014... we emphasize that today's version of the cartographic image of Ukraine... is completely unacceptable and looks like an inconsistent position of Fifa and UEFA." The 2026 World Cup will start on 11 June that year in Mexico City and end on 19 July in New Jersey. The expanded 48-team tournament will last a record 39 days. Ukraine were placed in Group D alongside Iceland, Azerbaijan and the yet-to-be-determined winners of France's Nations League quarter-final against Croatia. - -------------------------------------------------------------------------------- - Score: 0.5698, Text: Defending champions Manchester City will face Juventus in the group stage of the Fifa Club World Cup next summer, while Chelsea meet Brazilian side Flamengo. Pep Guardiola's City, who beat Brazilian side Fluminense to win the tournament for the first time in 2023, begin their title defence against Morocco's Wydad and also play Al Ain of the United Arab Emirates in Group G. Chelsea, winners of the 2021 final, were also drawn alongside Mexico's Club Leon and Tunisian side Esperance Sportive de Tunisie in Group D. The revamped Fifa Club World Cup, which has been expanded to 32 teams, will take place in the United States between 15 June and 13 July next year. - - A complex and lengthy draw ceremony was held across two separate Miami locations and lasted more than 90 minutes, during which a new Club World Cup trophy was revealed. There was also a video message from incoming US president Donald Trump, whose daughter Ivanka drew the first team. Lionel Messi's Inter Miami will take on Egyptian side Al Ahly at the Hard Rock Stadium in the opening match, staged in Miami. Elsewhere, Paris St-Germain were drawn against Atletico Madrid in Group B, while Bayern Munich meet Benfica in another all-European group-stage match-up. Teams will play each other once in the group phase and the top two will progress to the knockout stage. - - This video can not be played To play this video you need to enable JavaScript in your browser. What is the Club World Cup? - - Teams from each of the six international football confederations will be represented at next summer's tournament, including 12 European clubs - the highest quota of any confederation. The European places were decided by clubs' Champions League performances over the past four seasons, with recent winners Chelsea, Manchester City and Real Madrid guaranteed places. Al Ain, the most successful club in the UAE with 14 league titles, are owned by the country's president Sheikh Mohamed bin Zayed Al Nahyan - the older brother of City owner Sheikh Mansour. Real, who lifted the Fifa Club World Cup trophy for a record-extending fifth time in 2022, will open up against Saudi Pro League champions Al-Hilal, who currently have Neymar in their ranks. One place was reserved for a club from the host nation, which Fifa controversially awarded to Inter Miami, who will contest the tournament curtain-raiser. Messi's side were winners of the regular-season MLS Supporters' Shield but beaten in the MLS play-offs, meaning they are not this season's champions. - • None How does the new Club World Cup work & why is it so controversial? - - Matches will be played across 12 venues in the US which, alongside Canada and Mexico, also host the 2026 World Cup. Fifa is facing legal action from player unions and leagues about the scheduling of the event, which begins two weeks after the Champions League final at the end of the 2024-25 European calendar and ends five weeks before the first Premier League match of the 2025-2026 season. But football's world governing body believes the dates allow sufficient rest time before the start of the domestic campaigns. The Club World Cup will now take place once every four years, when it was previously held annually and involved just seven teams. Streaming platform DAZN has secured exclusive rights to broadcast next summer's tournament, during which 63 matches will take place over 29 days. - -------------------------------------------------------------------------------- - Score: 0.5792, Text: After Fifa awards Saudi Arabia the hosting rights for the men's 2034 World Cup, BBC analysis editor Ros Atkins looks at how we got here and the controversies surrounding the decision. - -------------------------------------------------------------------------------- - Score: 0.5877, Text: FA still to decide on endorsing Saudi World Cup bid - - ... (output truncated for brevity) - - -# Optimizing Vector Search with Global Secondary Index (GSI) - -While the above semantic search using similarity_search_with_score works effectively, we can significantly improve query performance by leveraging Global Secondary Index (GSI) in Couchbase. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -Hyperscale Vector Indexes (BHIVE) -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -Composite Vector Indexes -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -Choosing the Right Index Type -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For more details, see the [Couchbase Vector Index documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -Centroids (IVF - Inverted File): -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -Quantization Options: -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -Common Examples: -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -vector_store.create_index(index_type=IndexType.BHIVE, index_name="claude_bhive_index",index_description="IVF,SQ8") -``` - - -```python -query = "What happened with the map shown during the 2026 FIFA World Cup draw regarding Ukraine and Crimea? What was the controversy?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, score in search_results: - print(f"Score: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-09 12:26:01,504 - INFO - Semantic search completed in 0.44 seconds - - - - Semantic Search Results (completed in 0.44 seconds): - -------------------------------------------------------------------------------- - Score: 0.2502, Text: A map shown during the draw for the 2026 Fifa World Cup has been criticised by Ukraine as an "unacceptable error" after it appeared to exclude Crimea as part of the country. The graphic - showing countries that cannot be drawn to play each other for geopolitical reasons - highlighted Ukraine but did not include the peninsula that is internationally recognised to be part of it. Crimea has been under Russian occupation since 2014 and just a handful of countries recognise the peninsula as Russian territory. Ukraine Foreign Ministry spokesman Heorhiy Tykhy said that the nation expects "a public apology". Fifa said it was "aware of an issue" and the image had been removed. - - Writing on X, Tykhy said that Fifa had not only "acted against international law" but had also "supported Russian propaganda, war crimes, and the crime of aggression against Ukraine". He added a "fixed" version of the map to his post, highlighting Crimea as part of Ukraine's territory. Among the countries that cannot play each other are Ukraine and Belarus, Spain and Gibraltar and Kosovo versus either Bosnia and Herzegovina or Serbia. - - This Twitter post cannot be displayed in your browser. Please enable Javascript or try a different browser. View original content on Twitter The BBC is not responsible for the content of external sites. Skip twitter post by Heorhii Tykhyi This article contains content provided by Twitter. We ask for your permission before anything is loaded, as they may be using cookies and other technologies. You may want to read Twitter’s cookie policy, external and privacy policy, external before accepting. To view this content choose ‘accept and continue’. The BBC is not responsible for the content of external sites. - - The Ukrainian Football Association has also sent a letter to Fifa secretary-general Mathias Grafström and UEFA secretary-general Theodore Theodoridis over the matter. "We appeal to you to express our deep concern about the infographic map [shown] on December 13, 2024," the letter reads. "Taking into account a number of official decisions and resolutions adopted by the Fifa Council and the UEFA executive committee since 2014... we emphasize that today's version of the cartographic image of Ukraine... is completely unacceptable and looks like an inconsistent position of Fifa and UEFA." The 2026 World Cup will start on 11 June that year in Mexico City and end on 19 July in New Jersey. The expanded 48-team tournament will last a record 39 days. Ukraine were placed in Group D alongside Iceland, Azerbaijan and the yet-to-be-determined winners of France's Nations League quarter-final against Croatia. - -------------------------------------------------------------------------------- - Score: 0.5698, Text: Defending champions Manchester City will face Juventus in the group stage of the Fifa Club World Cup next summer, while Chelsea meet Brazilian side Flamengo. Pep Guardiola's City, who beat Brazilian side Fluminense to win the tournament for the first time in 2023, begin their title defence against Morocco's Wydad and also play Al Ain of the United Arab Emirates in Group G. Chelsea, winners of the 2021 final, were also drawn alongside Mexico's Club Leon and Tunisian side Esperance Sportive de Tunisie in Group D. The revamped Fifa Club World Cup, which has been expanded to 32 teams, will take place in the United States between 15 June and 13 July next year. - - A complex and lengthy draw ceremony was held across two separate Miami locations and lasted more than 90 minutes, during which a new Club World Cup trophy was revealed. There was also a video message from incoming US president Donald Trump, whose daughter Ivanka drew the first team. Lionel Messi's Inter Miami will take on Egyptian side Al Ahly at the Hard Rock Stadium in the opening match, staged in Miami. Elsewhere, Paris St-Germain were drawn against Atletico Madrid in Group B, while Bayern Munich meet Benfica in another all-European group-stage match-up. Teams will play each other once in the group phase and the top two will progress to the knockout stage. - - This video can not be played To play this video you need to enable JavaScript in your browser. What is the Club World Cup? - - Teams from each of the six international football confederations will be represented at next summer's tournament, including 12 European clubs - the highest quota of any confederation. The European places were decided by clubs' Champions League performances over the past four seasons, with recent winners Chelsea, Manchester City and Real Madrid guaranteed places. Al Ain, the most successful club in the UAE with 14 league titles, are owned by the country's president Sheikh Mohamed bin Zayed Al Nahyan - the older brother of City owner Sheikh Mansour. Real, who lifted the Fifa Club World Cup trophy for a record-extending fifth time in 2022, will open up against Saudi Pro League champions Al-Hilal, who currently have Neymar in their ranks. One place was reserved for a club from the host nation, which Fifa controversially awarded to Inter Miami, who will contest the tournament curtain-raiser. Messi's side were winners of the regular-season MLS Supporters' Shield but beaten in the MLS play-offs, meaning they are not this season's champions. - • None How does the new Club World Cup work & why is it so controversial? - - Matches will be played across 12 venues in the US which, alongside Canada and Mexico, also host the 2026 World Cup. Fifa is facing legal action from player unions and leagues about the scheduling of the event, which begins two weeks after the Champions League final at the end of the 2024-25 European calendar and ends five weeks before the first Premier League match of the 2025-2026 season. But football's world governing body believes the dates allow sufficient rest time before the start of the domestic campaigns. The Club World Cup will now take place once every four years, when it was previously held annually and involved just seven teams. Streaming platform DAZN has secured exclusive rights to broadcast next summer's tournament, during which 63 matches will take place over 29 days. - -------------------------------------------------------------------------------- - Score: 0.5792, Text: After Fifa awards Saudi Arabia the hosting rights for the men's 2034 World Cup, BBC analysis editor Ros Atkins looks at how we got here and the controversies surrounding the decision. - -------------------------------------------------------------------------------- - Score: 0.5877, Text: FA still to decide on endorsing Saudi World Cup bid - - ... (output truncated for brevity) - - -Note: To create a COMPOSITE index, the below code can be used. -Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but BHIVE might be more efficient for pure semantic search across news articles. - - -```python -vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="claude_composite_index", index_description="IVF,SQ8") -``` - -# Retrieval-Augmented Generation (RAG) with Couchbase and LangChain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -system_template = "You are a helpful assistant that answers questions based on the provided context." -system_message_prompt = SystemMessagePromptTemplate.from_template(system_template) - -human_template = "Context: {context}\n\nQuestion: {question}" -human_message_prompt = HumanMessagePromptTemplate.from_template(human_template) - -chat_prompt = ChatPromptTemplate.from_messages([ - system_message_prompt, - human_message_prompt -]) - -def format_docs(docs): - return "\n\n".join(doc.page_content for doc in docs) - -rag_chain = ( - {"context": lambda x: format_docs(vector_store.similarity_search(x)), "question": RunnablePassthrough()} - | chat_prompt - | llm -) -logging.info("Successfully created RAG chain") -``` - - 2025-09-09 12:26:10,540 - INFO - Successfully created RAG chain - - - -```python -try: - start_time = time.time() - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - - print(f"RAG Response: {rag_response.content}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except AuthenticationError as e: - print(f"Authentication error: {str(e)}") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: During the draw for the 2026 FIFA World Cup, a map was shown that excluded Crimea as part of Ukraine. This graphic, which was displaying countries that cannot be drawn to play each other for geopolitical reasons, highlighted Ukraine but did not include the Crimean peninsula, which is internationally recognized as Ukrainian territory. - - This omission sparked significant controversy because Crimea has been under Russian occupation since 2014, but only a handful of countries recognize it as Russian territory. The Ukrainian Foreign Ministry spokesman, Heorhiy Tykhy, called this an "unacceptable error" and stated that Ukraine expected "a public apology" from FIFA. He criticized FIFA for acting "against international law" and supporting "Russian propaganda, war crimes, and the crime of aggression against Ukraine." - - The Ukrainian Football Association also sent a formal letter of complaint to FIFA and UEFA officials expressing their "deep concern" about the cartographic representation. FIFA acknowledged they were "aware of an issue" and subsequently removed the image. - RAG response generated in 8.68 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - - -```python -try: - queries = [ - "What happened when Apple's AI feature generated a false BBC headline about a murder case in New York?", - "What happened with the map shown during the 2026 FIFA World Cup draw regarding Ukraine and Crimea? What was the controversy?", # Repeated query - "What happened when Apple's AI feature generated a false BBC headline about a murder case in New York?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response.content}") - print(f"Time taken: {elapsed_time:.2f} seconds") -except AuthenticationError as e: - print(f"Authentication error: {str(e)}") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened when Apple's AI feature generated a false BBC headline about a murder case in New York? - Response: According to the context, Apple Intelligence (an AI feature that summarizes notifications) generated a false headline that made it appear as if BBC News had published an article claiming Luigi Mangione, who was arrested for the murder of healthcare insurance CEO Brian Thompson in New York, had shot himself. This was completely false - Mangione had not shot himself. - - The BBC complained to Apple about this misrepresentation, with a BBC spokesperson stating they had "contacted Apple to raise this concern and fix the problem." The spokesperson emphasized that it's "essential" that audiences can trust information published under the BBC name, including notifications. - - This wasn't an isolated incident, as the context mentions that Apple's AI feature also misrepresented a New York Times article, incorrectly summarizing it as "Netanyahu arrested" when the actual article was about the International Criminal Court issuing an arrest warrant for the Israeli prime minister. - Time taken: 6.22 seconds - - Query 2: What happened with the map shown during the 2026 FIFA World Cup draw regarding Ukraine and Crimea? What was the controversy? - Response: During the draw for the 2026 FIFA World Cup, a map was shown that excluded Crimea as part of Ukraine. This graphic, which was displaying countries that cannot be drawn to play each other for geopolitical reasons, highlighted Ukraine but did not include the Crimean peninsula, which is internationally recognized as Ukrainian territory. - - This omission sparked significant controversy because Crimea has been under Russian occupation since 2014, but only a handful of countries recognize it as Russian territory. The Ukrainian Foreign Ministry spokesman, Heorhiy Tykhy, called this an "unacceptable error" and stated that Ukraine expected "a public apology" from FIFA. He criticized FIFA for acting "against international law" and supporting "Russian propaganda, war crimes, and the crime of aggression against Ukraine." - - The Ukrainian Football Association also sent a formal letter of complaint to FIFA and UEFA officials expressing their "deep concern" about the cartographic representation. FIFA acknowledged they were "aware of an issue" and subsequently removed the image. - Time taken: 0.47 seconds - - Query 3: What happened when Apple's AI feature generated a false BBC headline about a murder case in New York? - Response: According to the context, Apple Intelligence (an AI feature that summarizes notifications) generated a false headline that made it appear as if BBC News had published an article claiming Luigi Mangione, who was arrested for the murder of healthcare insurance CEO Brian Thompson in New York, had shot himself. This was completely false - Mangione had not shot himself. - - The BBC complained to Apple about this misrepresentation, with a BBC spokesperson stating they had "contacted Apple to raise this concern and fix the problem." The spokesperson emphasized that it's "essential" that audiences can trust information published under the BBC name, including notifications. - - This wasn't an isolated incident, as the context mentions that Apple's AI feature also misrepresented a New York Times article, incorrectly summarizing it as "Netanyahu arrested" when the actual article was about the International Criminal Court issuing an arrest warrant for the Israeli prime minister. - Time taken: 0.46 seconds - - -## Conclusion -By following these steps, you’ll have a fully functional semantic search engine that leverages the strengths of Couchbase and Claude(by Anthropic). This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how it improves querying data more efficiently using GSI which can significantly improve your RAG performance. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Cohere.md b/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Cohere.md deleted file mode 100644 index 8d3fb85..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Cohere.md +++ /dev/null @@ -1,728 +0,0 @@ ---- -# frontmatter -path: "/tutorial-cohere-couchbase-rag-with-global-secondary-index" -title: Retrieval-Augmented Generation (RAG) with Couchbase and Cohere with GSI -short_title: RAG with Couchbase and Cohere with GSI -description: - - Learn how to build a semantic search engine using Couchbase and Cohere using GSI. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Cohere embeddings and language models. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - Cohere -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/cohere/gsi/RAG_with_Couchbase_and_Cohere.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Cohere](https://cohere.com/) - as the AI-powered embedding and language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI( Global Secondary Index) from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-cohere-couchbase-rag-with-fts/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/cohere/RAG_with_Couchbase_and_Cohere.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for Cohere - -Please follow the [instructions](https://dashboard.cohere.com/welcome/register) to generate the Cohere credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -Note: To run this this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI vector search is supported only from version 8.0 - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.5.0 langchain-cohere==0.4.5 python-dotenv==1.1.1 -``` - - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta -from uuid import uuid4 - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException, - ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_cohere import ChatCohere, CohereEmbeddings -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_couchbase.vectorstores import IndexType -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True) - -# Supress Excessive logging -logging.getLogger('openai').setLevel(logging.WARNING) -logging.getLogger('httpx').setLevel(logging.WARNING) -logging.getLogger('langchain_cohere').setLevel(logging.ERROR) - -``` - -# Loading Sensitive Informnation -In this section, we prompt the user to input essential configuration settings needed for integrating Couchbase with Cohere's API. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv() - -COHERE_API_KEY = os.getenv('COHERE_API_KEY') or getpass.getpass('Enter your Cohere API key: ') -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: cohere): ') or 'cohere' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache' - -# Check if the variables are correctly loaded -if not COHERE_API_KEY: - raise ValueError("COHERE_API_KEY is not provided and is required.") -``` - -# Connect to Couchbase -The script attempts to establish a connection to the Couchbase database using the credentials retrieved from the environment variables. Couchbase is a NoSQL database known for its flexibility, scalability, and support for various data models, including document-based storage. The connection is authenticated using a username and password, and the script waits until the connection is fully established before proceeding. - - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-09-22 12:56:30,972 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-09-15 12:43:04,085 - INFO - Bucket 'query-vector-search-testing' exists. - - - 2025-09-15 12:43:04,101 - INFO - Collection 'cohere' already exists. Skipping creation. - 2025-09-15 12:43:06,191 - INFO - All documents cleared from the collection. - 2025-09-15 12:43:06,193 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-09-15 12:43:06,199 - INFO - Collection 'cache' already exists. Skipping creation. - 2025-09-15 12:43:08,367 - INFO - All documents cleared from the collection. - - - - - - - - - -# Create Embeddings -Embeddings are created using the Cohere API. Embeddings are vectors (arrays of numbers) that represent the meaning of text in a high-dimensional space. These embeddings are crucial for tasks like semantic search, where the goal is to find text that is semantically similar to a query. The script uses a pre-trained model provided by Cohere to generate embeddings for the text in the TREC dataset. - - -```python -try: - embeddings = CohereEmbeddings( - cohere_api_key=COHERE_API_KEY, - model="embed-english-v3.0", - ) - logging.info("Successfully created CohereEmbeddings") -except Exception as e: - raise ValueError(f"Error creating CohereEmbeddings: {str(e)}") -``` - - 2025-09-22 12:56:36,813 - INFO - Successfully created CohereEmbeddings - - -# Set Up Vector Store -The vector store is set up to manage the embeddings created in the previous step. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors. In this case, the vector store is built on top of Couchbase, allowing the script to store the embeddings in a way that can be efficiently searched. - - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding = embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - - 2025-09-22 12:56:39,259 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-09-15 12:43:32,383 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of 50 articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Progress Tracking: Easier to monitor and track the ingestion progress -3. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 50 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -``` - - 2025-09-15 12:45:26,834 - INFO - Document ingestion completed successfully. - - -# Create Language Model (LLM) -The script initializes a Cohere language model (LLM) that will be used for generating responses to queries. LLMs are powerful tools for natural language understanding and generation, capable of producing human-like text based on input prompts. The model is configured with specific parameters, such as the temperature, which controls the randomness of its outputs. - - - -```python -try: - llm = ChatCohere( - cohere_api_key=COHERE_API_KEY, - model="command-a-03-2025", - temperature=0 - ) - logging.info("Successfully created Cohere LLM with model command") -except Exception as e: - raise ValueError(f"Error creating Cohere LLM: {str(e)}") -``` - - 2025-09-22 12:58:23,399 - INFO - Successfully created Cohere LLM with model command - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself. - -In the provided code, the search process begins by recording the start time, followed by executing the `similarity_search_with_score` method of the `CouchbaseQueryVectorStore`. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and the distance that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-22 12:59:03,622 - INFO - Semantic search completed in 1.18 seconds - - - - Semantic Search Results (completed in 1.18 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3359, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - Manchester City boss Pep Guardiola says he is "fine" despite admitting his sleep and diet are being affected by the worst run of results in his entire managerial career. In an interview with former Italy international Luca Toni for Amazon Prime Sport before Wednesday's Champions League defeat by Juventus, Guardiola touched on the personal impact City's sudden downturn in form has had. Guardiola said his state of mind was "ugly", that his sleep was "worse" and he was eating lighter as his digestion had suffered. City go into Sunday's derby against Manchester United at Etihad Stadium having won just one of their past 10 games. The Juventus loss means there is a chance they may not even secure a play-off spot in the Champions League. Asked to elaborate on his comments to Toni, Guardiola said: "I'm fine. "In our jobs we always want to do our best or the best as possible. When that doesn't happen you are more uncomfortable than when the situation is going well, always that happened. "In good moments I am happier but when I get to the next game I am still concerned about what I have to do. There is no human being that makes an activity and it doesn't matter how they do." Guardiola said City have to defend better and "avoid making mistakes at both ends". To emphasise his point, Guardiola referred back to the third game of City's current run, against a Sporting side managed by Ruben Amorim, who will be in the United dugout at the weekend. City dominated the first half in Lisbon, led thanks to Phil Foden's early effort and looked to be cruising. Instead, they conceded three times in 11 minutes either side of half-time as Sporting eventually ran out 4-1 winners. "I would like to play the game like we played in Lisbon on Sunday, believe me," said Guardiola, who is facing the prospect of only having three fit defenders for the derby as Nathan Ake and Manuel Akanji try to overcome injury concerns. If there is solace for City, it comes from the knowledge United are not exactly flying. Their comeback Europa League victory against Viktoria Plzen on Thursday was their third win of Amorim's short reign so far but only one of those successes has come in the Premier League, where United have lost their past two games against Arsenal and Nottingham Forest. Nevertheless, Guardiola can see improvements already on the red side of the city. "It's already there," he said. "You see all the patterns, the movements, the runners and the pace. He will do a good job at United, I'm pretty sure of that." - - Guardiola says skipper Kyle Walker has been offered support by the club after the City defender highlighted the racial abuse he had received on social media in the wake of the Juventus trip. "It's unacceptable," he said. "Not because it's Kyle - for any human being. "Unfortunately it happens many times in the real world. It is not necessary to say he has the support of the entire club. It is completely unacceptable and we give our support to him." - -------------------------------------------------------------------------------- - Distance: 0.3477, Text: 'We have to find a way' - Guardiola vows to end relegation form - - This video can not be played To play this video you need to enable JavaScript in your browser. 'Worrying' and 'staggering' - Why do Manchester City keep conceding? - - Manchester City are currently in relegation form and there is little sign of it ending. Saturday's 2-1 defeat at Aston Villa left them joint bottom of the form table over the past eight games with just Southampton for company. Saints, at the foot of the Premier League, have the same number of points, four, as City over their past eight matches having won one, drawn one and lost six - the same record as the floundering champions. And if Southampton - who appointed Ivan Juric as their new manager on Saturday - get at least a point at Fulham on Sunday, City will be on the worst run in the division. Even Wolves, who sacked boss Gary O'Neil last Sunday and replaced him with Vitor Pereira, have earned double the number of points during the same period having played a game fewer. They are damning statistics for Pep Guardiola, even if he does have some mitigating circumstances with injuries to Ederson, Nathan Ake and Ruben Dias - who all missed the loss at Villa Park - and the long-term loss of midfield powerhouse Rodri. Guardiola was happy with Saturday's performance, despite defeat in Birmingham, but there is little solace to take at slipping further out of the title race. He may have needed to field a half-fit Manuel Akanji and John Stones at Villa Park but that does not account for City looking a shadow of their former selves. That does not justify the error Josko Gvardiol made to gift Jhon Duran a golden chance inside the first 20 seconds, or £100m man Jack Grealish again failing to have an impact on a game. There may be legitimate reasons for City's drop off, whether that be injuries, mental fatigue or just simply a team coming to the end of its lifecycle, but their form, which has plunged off a cliff edge, would have been unthinkable as they strolled to a fourth straight title last season. "The worrying thing is the number of goals conceded," said ex-England captain Alan Shearer on BBC Match of the Day. "The number of times they were opened up because of the lack of protection and legs in midfield was staggering. There are so many things that are wrong at this moment in time." - - This video can not be played To play this video you need to enable JavaScript in your browser. Man City 'have to find a way' to return to form - Guardiola - - Afterwards Guardiola was calm, so much so it was difficult to hear him in the news conference, a contrast to the frustrated figure he cut on the touchline. He said: "It depends on us. The solution is bring the players back. We have just one central defender fit, that is difficult. We are going to try next game - another opportunity and we don't think much further than that. "Of course there are more reasons. We concede the goals we don't concede in the past, we [don't] score the goals we score in the past. Football is not just one reason. There are a lot of little factors. "Last season we won the Premier League, but we came here and lost. We have to think positive and I have incredible trust in the guys. Some of them have incredible pride and desire to do it. We have to find a way, step by step, sooner or later to find a way back." Villa boss Unai Emery highlighted City's frailties, saying he felt Villa could seize on the visitors' lack of belief. "Manchester City are a little bit under the confidence they have normally," he said. "The second half was different, we dominated and we scored. Through those circumstances they were feeling worse than even in the first half." - - Erling Haaland had one touch in the Villa box - - There are chinks in the armour never seen before at City under Guardiola and Erling Haaland conceded belief within the squad is low. He told TNT after the game: "Of course, [confidence levels are] not the best. We know how important confidence is and you can see that it affects every human being. That is how it is, we have to continue and stay positive even though it is difficult." Haaland, with 76 goals in 83 Premier League appearances since joining City from Borussia Dortmund in 2022, had one shot and one touch in the Villa box. His 18 touches in the whole game were the lowest of all starting players and he has been self critical, despite scoring 13 goals in the top flight this season. Over City's last eight games he has netted just twice though, but Guardiola refused to criticise his star striker. He said: "Without him we will be even worse but I like the players feeling that way. I don't agree with Erling. He needs to have the balls delivered in the right spots but he will fight for the next one." - -------------------------------------------------------------------------------- - Distance: 0.3677, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City - - - ... (output truncated for brevity) - - -# Optimizing Vector Search with Global Secondary Index (GSI) - -While the above semantic search using similarity_search_with_score works effectively, we can significantly improve query performance by leveraging Global Secondary Index (GSI) in Couchbase. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -Hyperscale Vector Indexes (BHIVE) -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -Composite Vector Indexes -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -Choosing the Right Index Type -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For more details, see the [Couchbase Vector Index documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -Centroids (IVF - Inverted File): -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -Quantization Options: -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -Common Examples: -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -vector_store.create_index(index_type=IndexType.BHIVE, index_name="cohere_bhive_index",index_description="IVF,SQ8") -``` - -The example below shows running the same similarity search, but now using the BHIVE GSI index we created above. You'll notice improved performance as the index efficiently retrieves data. - -**Important**: When using Composite indexes, scalar filters take precedence over vector similarity, which can improve performance for filtered searches but may miss some semantically relevant results that don't match the scalar criteria. - -Note: In GSI vector search, the distance represents the vector distance between the query and document embeddings. Lower distance indicate higher similarity, while higher distance indicate lower similarity. - - -```python -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-22 12:59:26,949 - INFO - Semantic search completed in 0.38 seconds - - - - Semantic Search Results (completed in 0.38 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3359, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - Manchester City boss Pep Guardiola says he is "fine" despite admitting his sleep and diet are being affected by the worst run of results in his entire managerial career. In an interview with former Italy international Luca Toni for Amazon Prime Sport before Wednesday's Champions League defeat by Juventus, Guardiola touched on the personal impact City's sudden downturn in form has had. Guardiola said his state of mind was "ugly", that his sleep was "worse" and he was eating lighter as his digestion had suffered. City go into Sunday's derby against Manchester United at Etihad Stadium having won just one of their past 10 games. The Juventus loss means there is a chance they may not even secure a play-off spot in the Champions League. Asked to elaborate on his comments to Toni, Guardiola said: "I'm fine. "In our jobs we always want to do our best or the best as possible. When that doesn't happen you are more uncomfortable than when the situation is going well, always that happened. "In good moments I am happier but when I get to the next game I am still concerned about what I have to do. There is no human being that makes an activity and it doesn't matter how they do." Guardiola said City have to defend better and "avoid making mistakes at both ends". To emphasise his point, Guardiola referred back to the third game of City's current run, against a Sporting side managed by Ruben Amorim, who will be in the United dugout at the weekend. City dominated the first half in Lisbon, led thanks to Phil Foden's early effort and looked to be cruising. Instead, they conceded three times in 11 minutes either side of half-time as Sporting eventually ran out 4-1 winners. "I would like to play the game like we played in Lisbon on Sunday, believe me," said Guardiola, who is facing the prospect of only having three fit defenders for the derby as Nathan Ake and Manuel Akanji try to overcome injury concerns. If there is solace for City, it comes from the knowledge United are not exactly flying. Their comeback Europa League victory against Viktoria Plzen on Thursday was their third win of Amorim's short reign so far but only one of those successes has come in the Premier League, where United have lost their past two games against Arsenal and Nottingham Forest. Nevertheless, Guardiola can see improvements already on the red side of the city. "It's already there," he said. "You see all the patterns, the movements, the runners and the pace. He will do a good job at United, I'm pretty sure of that." - - Guardiola says skipper Kyle Walker has been offered support by the club after the City defender highlighted the racial abuse he had received on social media in the wake of the Juventus trip. "It's unacceptable," he said. "Not because it's Kyle - for any human being. "Unfortunately it happens many times in the real world. It is not necessary to say he has the support of the entire club. It is completely unacceptable and we give our support to him." - -------------------------------------------------------------------------------- - Distance: 0.3477, Text: 'We have to find a way' - Guardiola vows to end relegation form - - This video can not be played To play this video you need to enable JavaScript in your browser. 'Worrying' and 'staggering' - Why do Manchester City keep conceding? - - Manchester City are currently in relegation form and there is little sign of it ending. Saturday's 2-1 defeat at Aston Villa left them joint bottom of the form table over the past eight games with just Southampton for company. Saints, at the foot of the Premier League, have the same number of points, four, as City over their past eight matches having won one, drawn one and lost six - the same record as the floundering champions. And if Southampton - who appointed Ivan Juric as their new manager on Saturday - get at least a point at Fulham on Sunday, City will be on the worst run in the division. Even Wolves, who sacked boss Gary O'Neil last Sunday and replaced him with Vitor Pereira, have earned double the number of points during the same period having played a game fewer. They are damning statistics for Pep Guardiola, even if he does have some mitigating circumstances with injuries to Ederson, Nathan Ake and Ruben Dias - who all missed the loss at Villa Park - and the long-term loss of midfield powerhouse Rodri. Guardiola was happy with Saturday's performance, despite defeat in Birmingham, but there is little solace to take at slipping further out of the title race. He may have needed to field a half-fit Manuel Akanji and John Stones at Villa Park but that does not account for City looking a shadow of their former selves. That does not justify the error Josko Gvardiol made to gift Jhon Duran a golden chance inside the first 20 seconds, or £100m man Jack Grealish again failing to have an impact on a game. There may be legitimate reasons for City's drop off, whether that be injuries, mental fatigue or just simply a team coming to the end of its lifecycle, but their form, which has plunged off a cliff edge, would have been unthinkable as they strolled to a fourth straight title last season. "The worrying thing is the number of goals conceded," said ex-England captain Alan Shearer on BBC Match of the Day. "The number of times they were opened up because of the lack of protection and legs in midfield was staggering. There are so many things that are wrong at this moment in time." - - This video can not be played To play this video you need to enable JavaScript in your browser. Man City 'have to find a way' to return to form - Guardiola - - Afterwards Guardiola was calm, so much so it was difficult to hear him in the news conference, a contrast to the frustrated figure he cut on the touchline. He said: "It depends on us. The solution is bring the players back. We have just one central defender fit, that is difficult. We are going to try next game - another opportunity and we don't think much further than that. "Of course there are more reasons. We concede the goals we don't concede in the past, we [don't] score the goals we score in the past. Football is not just one reason. There are a lot of little factors. "Last season we won the Premier League, but we came here and lost. We have to think positive and I have incredible trust in the guys. Some of them have incredible pride and desire to do it. We have to find a way, step by step, sooner or later to find a way back." Villa boss Unai Emery highlighted City's frailties, saying he felt Villa could seize on the visitors' lack of belief. "Manchester City are a little bit under the confidence they have normally," he said. "The second half was different, we dominated and we scored. Through those circumstances they were feeling worse than even in the first half." - - Erling Haaland had one touch in the Villa box - - There are chinks in the armour never seen before at City under Guardiola and Erling Haaland conceded belief within the squad is low. He told TNT after the game: "Of course, [confidence levels are] not the best. We know how important confidence is and you can see that it affects every human being. That is how it is, we have to continue and stay positive even though it is difficult." Haaland, with 76 goals in 83 Premier League appearances since joining City from Borussia Dortmund in 2022, had one shot and one touch in the Villa box. His 18 touches in the whole game were the lowest of all starting players and he has been self critical, despite scoring 13 goals in the top flight this season. Over City's last eight games he has netted just twice though, but Guardiola refused to criticise his star striker. He said: "Without him we will be even worse but I like the players feeling that way. I don't agree with Erling. He needs to have the balls delivered in the right spots but he will fight for the next one." - -------------------------------------------------------------------------------- - Distance: 0.3677, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City - - - ... (output truncated for brevity) - - -Note: To create a COMPOSITE index, the below code can be used. -Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but BHIVE might be more efficient for pure semantic search across news articles. - - -```python -vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="cohere_composite_index", index_description="IVF,SQ8") -``` - -# Set Up Cache - A cache is set up using Couchbase to store intermediate results and frequently accessed data. Caching is important for improving performance, as it reduces the need to repeatedly calculate or retrieve the same data. The cache is linked to a specific collection in Couchbase, and it is used later in the script to store the results of language model queries. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-09-22 12:59:40,381 - INFO - Successfully created cache - - -# Retrieval-Augmented Generation (RAG) with Couchbase and Langchain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -try: - template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below: - {context} - - Question: {question}""" - prompt = ChatPromptTemplate.from_template(template) - - rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | prompt - | llm - | StrOutputParser() - ) - logging.info("Successfully created RAG chain") -except Exception as e: - raise ValueError(f"Error creating RAG chain: {str(e)}") -``` - - 2025-09-15 12:53:46,979 - INFO - Successfully created RAG chain - - - -```python -start_time = time.time() -try: - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: Manchester City manager Pep Guardiola has expressed concern and frustration over the team's recent form, describing it as the "worst run of results" in his managerial career. He has admitted that the situation has affected his sleep and diet, stating that his state of mind is "ugly" and his sleep is "worse." Guardiola has also acknowledged the need for the team to defend better and avoid making mistakes at both ends of the pitch. Despite the challenges, he remains focused on finding solutions and has emphasized the importance of bringing injured players back to the squad. Guardiola has also highlighted the need for the team to recover its essence by improving defensive concepts and re-establishing the intensity they are known for. He has taken a self-critical approach, stating that he is "not good enough" to resolve the situation with the current group of players and has vowed to find solutions to turn the team's form around. - RAG response generated in 4.09 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What was manchester city manager pep guardiola's reaction to the team's current form?", # Repeated query - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, Liverpool played with 10 men for 89 minutes after Andy Robertson received a red card in the 17th minute. Despite this numerical disadvantage, Liverpool managed to secure a 2-2 draw at Anfield. Fulham took the lead twice, but Liverpool responded both times, with Diogo Jota scoring an 86th-minute equalizer. The performance highlighted Liverpool's resilience and title credentials, with Fulham's Antonee Robinson praising Liverpool for not seeming like they were a man down. Liverpool maintained over 60% possession and dominated attacking metrics, showcasing their ability to fight back under adversity. - Time taken: 2.12 seconds - - Query 2: What was manchester city manager pep guardiola's reaction to the team's current form? - Response: Manchester City manager Pep Guardiola has expressed concern and frustration over the team's recent form, describing it as the "worst run of results" in his managerial career. He has admitted that the situation has affected his sleep and diet, stating that his state of mind is "ugly" and his sleep is "worse." Guardiola has also acknowledged the need for the team to defend better and avoid making mistakes at both ends of the pitch. Despite the challenges, he remains focused on finding solutions and has emphasized the importance of bringing injured players back to the squad. Guardiola has also highlighted the need for the team to recover its essence by improving defensive concepts and re-establishing the intensity they are known for. He has taken a self-critical approach, stating that he is "not good enough" to resolve the situation with the current group of players and has vowed to find solutions to turn the team's form around. - Time taken: 0.35 seconds - - Query 3: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, Liverpool played with 10 men for 89 minutes after Andy Robertson received a red card in the 17th minute. Despite this numerical disadvantage, Liverpool managed to secure a 2-2 draw at Anfield. Fulham took the lead twice, but Liverpool responded both times, with Diogo Jota scoring an 86th-minute equalizer. The performance highlighted Liverpool's resilience and title credentials, with Fulham's Antonee Robinson praising Liverpool for not seeming like they were a man down. Liverpool maintained over 60% possession and dominated attacking metrics, showcasing their ability to fight back under adversity. - Time taken: 0.35 seconds - - -## Conclusion -By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and Cohere. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how it improves querying data more efficiently using GSI which can significantly improve your RAG performance. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_CrewAI.md b/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_CrewAI.md deleted file mode 100644 index b268a40..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_CrewAI.md +++ /dev/null @@ -1,929 +0,0 @@ ---- -# frontmatter -path: "/tutorial-crewai-couchbase-rag-with-global-secondary-index" -title: Retrieval-Augmented Generation (RAG) with Couchbase and CrewAI with GSI -short_title: RAG with Couchbase and CrewAI with GSI -description: - - Learn how to build a semantic search engine using Couchbase and CrewAI. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with CrewAI's agent-based approach. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain, CrewAI, and Couchbase with GSI. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - CrewAI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/crewai/gsi/RAG_with_Couchbase_and_CrewAI.ipynb) - -# Agent-Based RAG with Couchbase GSI Vector Search and CrewAI - -## Overview - -In this guide, we will walk you through building a powerful semantic search engine using [Couchbase](https://www.couchbase.com) as the backend database and [CrewAI](https://github.com/crewAIInc/crewAI) for agent-based RAG operations. CrewAI allows us to create specialized agents that can work together to handle different aspects of the RAG workflow, from document retrieval to response generation. This tutorial uses Couchbase's **Global Secondary Index (GSI)** vector search capabilities, which offer high-performance vector search optimized for large-scale applications. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-crewai-couchbase-rag-using-fts) - -## How to Run This Tutorial - -This tutorial is available as a Jupyter Notebook (.ipynb file) that you can run interactively. You can access the original notebook here. - -You can either: -- Download the notebook file and run it on [Google Colab](https://colab.research.google.com) -- Run it on your system by setting up the Python environment - -## Prerequisites - -### Couchbase Requirements - -1. Create and Deploy Your Free Tier Operational cluster on [Capella](https://cloud.couchbase.com/sign-up) - - To get started with [Couchbase Capella](https://cloud.couchbase.com), create an account and use it to deploy a free tier operational cluster - - This account provides you with an environment where you can explore and learn about Capella - - To learn more, please follow the [Getting Started Guide](https://docs.couchbase.com/cloud/get-started/create-account.html) - - **Important**: This tutorial requires Couchbase Server **8.0+** for GSI vector search capabilities - -### Couchbase Capella Configuration - -When running Couchbase using Capella, the following prerequisites need to be met: -- Create the database credentials to access the required bucket (Read and Write) used in the application -- Allow access to the Cluster from the IP on which the application is running by following the [Network Security documentation](https://docs.couchbase.com/cloud/security/security.html#public-access) - -## Setup and Installation - -### Installing Necessary Libraries - -We'll install the following key libraries: -- `datasets`: For loading and managing our training data -- `langchain-couchbase`: To integrate Couchbase with LangChain for GSI vector storage and caching -- `langchain-openai`: For accessing OpenAI's embedding and chat models -- `crewai`: To create and orchestrate our AI agents for RAG operations -- `python-dotenv`: For securely managing environment variables and API keys - -These libraries provide the foundation for building a semantic search engine with GSI vector embeddings, database integration, and agent-based RAG capabilities. - - -```python -%pip install --quiet datasets==4.1.0 langchain-couchbase==0.5.0 langchain-openai==0.3.33 crewai==0.186.1 python-dotenv==1.1.1 -``` - - Note: you may need to restart the kernel to use updated packages. - - -### Import Required Modules - -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta -from uuid import uuid4 - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.diagnostics import PingState, ServiceType -from couchbase.exceptions import (InternalServerFailureException, - QueryIndexAlreadyExistsException, - ServiceUnavailableException, - CouchbaseException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from crewai.tools import tool -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy, IndexType -from langchain_openai import ChatOpenAI, OpenAIEmbeddings - -from crewai import Agent, Crew, Process, Task -``` - -### Configure Logging - -Logging is configured to track the progress of the script and capture any errors or warnings. - - -```python -logging.basicConfig( - level=logging.INFO, - format='%(asctime)s [%(levelname)s] %(message)s', - datefmt='%Y-%m-%d %H:%M:%S' -) - -# Suppress httpx logging -logging.getLogger('httpx').setLevel(logging.CRITICAL) -``` - -### Load Environment Configuration - -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script uses environment variables to store sensitive information, enhancing the overall security and maintainability of your code by avoiding hardcoded values. - - -```python -# Load environment variables -load_dotenv("./.env") - -# Configuration -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or input("Enter your OpenAI API key: ") -if not OPENAI_API_KEY: - raise ValueError("OPENAI_API_KEY is not set") - -CB_HOST = os.getenv('CB_HOST') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or 'vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or 'crew' - -print("Configuration loaded successfully") -``` - - Configuration loaded successfully - - -## Couchbase Connection Setup - -### Connect to Cluster - -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - -```python -# Connect to Couchbase -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - print("Successfully connected to Couchbase") -except Exception as e: - print(f"Failed to connect to Couchbase: {str(e)}") - raise -``` - - Successfully connected to Couchbase - - -### Setup Collections - -Create and configure Couchbase bucket, scope, and collection for storing our vector data. - -1. **Bucket Creation:** - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: If you are using Capella, create a bucket manually called vector-search-testing(or any name you prefer) with the same properties. - -2. **Scope Management:** - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. **Collection Setup:** - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -**Additional Tasks:** -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -``` - - 2025-10-06 10:17:53 [INFO] Bucket 'vector-search-testing' exists. - 2025-10-06 10:17:53 [INFO] Collection 'crew' already exists. Skipping creation. - 2025-10-06 10:17:55 [INFO] All documents cleared from the collection. - - - - - - - - - -## Understanding GSI Vector Search - -### GSI Vector Index Configuration - -Semantic search with GSI requires creating a Global Secondary Index optimized for vector operations. Unlike FTS-based vector search, GSI vector indexes offer two distinct types optimized for different use cases: - -#### GSI Vector Index Types - -##### Hyperscale Vector Indexes (BHIVE) - -- **Best for**: Pure vector searches like content discovery, recommendations, and semantic search -- **Performance**: High performance with low memory footprint, optimized for concurrent operations -- **Scalability**: Designed to scale to billions of vectors -- **Use when**: You primarily perform vector-only queries without complex scalar filtering - -##### Composite Vector Indexes - -- **Best for**: Filtered vector searches that combine vector search with scalar value filtering -- **Performance**: Efficient pre-filtering where scalar attributes reduce the vector comparison scope -- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- **Note**: Scalar filters take precedence over vector similarity - -#### Understanding Index Configuration - -The `index_description` parameter controls how Couchbase optimizes vector storage and search through centroids and quantization: - -**Format**: `'IVF[],{PQ|SQ}'` - -**Centroids (IVF - Inverted File):** -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -**Quantization Options:** -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -**Common Examples:** -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -```python -# GSI Vector Index Configuration -# Unlike FTS indexes, GSI vector indexes are created programmatically through the vector store -# We'll configure the parameters that will be used for index creation - -# Vector configuration -DISTANCE_STRATEGY = DistanceStrategy.COSINE # Cosine similarity -INDEX_TYPE = IndexType.BHIVE # Using BHIVE for high-performance vector -INDEX_DESCRIPTION = "IVF,SQ8" # Auto-selected centroids with 8-bit scalar quantization - -# To create a Composite Index instead, use the following: -# INDEX_TYPE = IndexType.COMPOSITE # Combines vector search with scalar filtering - -print("GSI vector index configuration prepared") -``` - - GSI vector index configuration prepared - - -### Alternative: Composite Index Configuration - -If your use case requires complex filtering with scalar attributes, you can create a **Composite index** instead by changing the configuration: - -```python -# Alternative configuration for Composite index -INDEX_TYPE = IndexType.COMPOSITE # Instead of IndexType.BHIVE -INDEX_DESCRIPTION = "IVF,SQ8" # Same quantization settings -DISTANCE_STRATEGY = DistanceStrategy.COSINE # Same distance metric - -# The rest of the setup remains identical -``` - -**Use Composite indexes when:** -- You need to filter by document metadata or attributes before vector similarity -- Your queries combine vector search with WHERE clauses -- You have well-defined filtering requirements that can reduce the search space - -**Note**: The index creation process is identical - just change the `INDEX_TYPE`. Composite indexes enable pre-filtering with scalar attributes, making them ideal for applications requiring complex query patterns with metadata filtering. - -## OpenAI Configuration - -This section initializes two key OpenAI components needed for our RAG system: - -1. **OpenAI Embeddings:** - - Uses the 'text-embedding-3-small' model - - Converts text into high-dimensional vector representations (embeddings) - - These embeddings enable semantic search by capturing the meaning of text - - Required for vector similarity search in Couchbase - -2. **ChatOpenAI Language Model:** - - Uses the 'gpt-4o' model - - Temperature set to 0.2 for balanced creativity and focus - - Serves as the cognitive engine for CrewAI agents - - Powers agent reasoning, decision-making, and task execution - - Enables agents to: - - Process and understand retrieved context from vector search - - Generate thoughtful responses based on that context - - Follow instructions defined in agent roles and goals - - Collaborate with other agents in the crew - - The relatively low temperature (0.2) ensures agents produce reliable, consistent outputs while maintaining some creative problem-solving ability - -Both components require a valid OpenAI API key (OPENAI_API_KEY) for authentication. -In the CrewAI framework, the LLM acts as the "brain" for each agent, allowing them to interpret tasks, retrieve relevant information via the RAG system, and generate appropriate outputs based on their specialized roles and expertise. - - -```python -# Initialize OpenAI components -embeddings = OpenAIEmbeddings( - openai_api_key=OPENAI_API_KEY, - model="text-embedding-3-small" -) - -llm = ChatOpenAI( - openai_api_key=OPENAI_API_KEY, - model="gpt-4o", - temperature=0.2 -) - -print("OpenAI components initialized") -``` - - OpenAI components initialized - - -## Document Processing and Vector Store Setup - -### Create Couchbase GSI Vector Store - -Set up the GSI vector store where we'll store document embeddings for high-performance semantic search. - - -```python -# Setup GSI vector store with OpenAI embeddings -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - distance_metric=DISTANCE_STRATEGY - ) - print("GSI Vector store initialized successfully") - logging.info("GSI Vector store setup completed") -except Exception as e: - logging.error(f"Failed to initialize GSI vector store: {str(e)}") - raise RuntimeError(f"GSI Vector store initialization failed: {str(e)}") -``` - - 2025-10-06 10:18:05 [INFO] GSI Vector store setup completed - - - GSI Vector store initialized successfully - - -### Load BBC News Dataset - -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-10-06 10:18:13 [INFO] Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -#### Data Cleaning - -Remove duplicate articles for cleaner search results. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -#### Save Data to Vector Store - -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. **Memory Efficiency**: Processing in smaller batches prevents memory overload -2. **Error Handling**: If an error occurs, only the current batch is affected -3. **Progress Tracking**: Easier to monitor and track the ingestion progress -4. **Resource Management**: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. The optimal batch size depends on many factors including document sizes, available system resources, network conditions, and concurrent workload. - - -```python -batch_size = 50 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") -``` - - 2025-10-06 10:19:43 [INFO] Document ingestion completed successfully. - - -## Vector Search Performance Testing - -Now let's demonstrate the performance benefits of GSI optimization by testing pure vector search performance. We'll compare three optimization levels: - -1. **Baseline Performance**: Vector search without GSI optimization -2. **GSI-Optimized Performance**: Same search with BHIVE GSI index -3. **Cache Benefits**: Show how caching can be applied on top of GSI for repeated queries - -**Important**: This testing focuses on pure vector search performance, isolating the GSI improvements from other workflow overhead. - -### Create Vector Search Function - - -```python -import time - -# Create GSI vector retriever optimized for high-performance searches -retriever = vector_store.as_retriever( - search_type="similarity", - search_kwargs={"k": 4} # Return top 4 most similar documents -) - -def test_vector_search_performance(query_text, label="Vector Search"): - """Test pure vector search performance and return timing metrics""" - print(f"\n[{label}] Testing vector search performance") - print(f"[{label}] Query: '{query_text}'") - - start_time = time.time() - - try: - # Perform vector search using the retriever - docs = retriever.invoke(query_text) - end_time = time.time() - - search_time = end_time - start_time - print(f"[{label}] Vector search completed in {search_time:.4f} seconds") - print(f"[{label}] Found {len(docs)} relevant documents") - - # Show a preview of the first result - if docs: - preview = docs[0].page_content[:100] + "..." if len(docs[0].page_content) > 100 else docs[0].page_content - print(f"[{label}] Top result preview: {preview}") - - return search_time - except Exception as e: - print(f"[{label}] Vector search failed: {str(e)}") - return None -``` - -### Test 1: Baseline Performance (No GSI Index) - -Test pure vector search performance without GSI optimization. - - -```python -# Test baseline vector search performance without GSI index -test_query = "What are the latest developments in football transfers?" -print("Testing baseline vector search performance without GSI optimization...") -baseline_time = test_vector_search_performance(test_query, "Baseline Search") -print(f"\nBaseline vector search time (without GSI): {baseline_time:.4f} seconds\n") -``` - - Testing baseline vector search performance without GSI optimization... - - [Baseline Search] Testing vector search performance - [Baseline Search] Query: 'What are the latest developments in football transfers?' - [Baseline Search] Vector search completed in 1.3999 seconds - [Baseline Search] Found 4 relevant documents - [Baseline Search] Top result preview: The latest updates and analysis from the BBC. - - Baseline vector search time (without GSI): 1.3999 seconds - - - -### Create BHIVE GSI Index - -Now let's create a BHIVE GSI vector index to enable high-performance vector searches. The index creation is done programmatically through the vector store, which will optimize the index settings based on our data and requirements. - - -```python -# Create GSI Vector Index for high-performance searches -print("Creating BHIVE GSI vector index...") -try: - # Create a BHIVE index optimized for pure vector searches - vector_store.create_index( - index_type=INDEX_TYPE, # BHIVE index type - index_description=INDEX_DESCRIPTION # IVF,SQ8 for optimized performance - ) - print(f"GSI Vector index created successfully") - logging.info(f"BHIVE index created with description '{INDEX_DESCRIPTION}'") - - # Wait a moment for index to be available - print("Waiting for index to become available...") - time.sleep(5) - -except Exception as e: - # Index might already exist, which is fine - if "already exists" in str(e).lower(): - print(f"GSI Vector index already exists, proceeding...") - logging.info(f"Index already exists") - else: - logging.error(f"Failed to create GSI index: {str(e)}") - raise RuntimeError(f"GSI index creation failed: {str(e)}") -``` - - Creating BHIVE GSI vector index... - - - 2025-10-06 10:20:15 [INFO] BHIVE index created with description 'IVF,SQ8' - - - GSI Vector index created successfully - Waiting for index to become available... - - -### Test 2: GSI-Optimized Performance - -Test the same vector search with BHIVE GSI optimization. - - -```python -# Test vector search performance with GSI index -print("Testing vector search performance with BHIVE GSI optimization...") -gsi_search_time = test_vector_search_performance(test_query, "GSI-Optimized Search") -``` - - Testing vector search performance with BHIVE GSI optimization... - - [GSI-Optimized Search] Testing vector search performance - [GSI-Optimized Search] Query: 'What are the latest developments in football transfers?' - [GSI-Optimized Search] Vector search completed in 0.5885 seconds - [GSI-Optimized Search] Found 4 relevant documents - [GSI-Optimized Search] Top result preview: Four key areas for Everton's new owners to address - - Everton fans last saw silverware in 1995 when th... - - -### Test 3: Cache Benefits Testing - -Now let's demonstrate how caching can improve performance for repeated queries. **Note**: Caching benefits apply to both baseline and GSI-optimized searches. - - -```python -# Test cache benefits with a different query to avoid interference -cache_test_query = "What happened in the latest Premier League matches?" - -print("Testing cache benefits with vector search...") -print("First execution (cache miss):") -cache_time_1 = test_vector_search_performance(cache_test_query, "Cache Test - First Run") - -print("\nSecond execution (cache hit - should be faster):") -cache_time_2 = test_vector_search_performance(cache_test_query, "Cache Test - Second Run") -``` - - Testing cache benefits with vector search... - First execution (cache miss): - - [Cache Test - First Run] Testing vector search performance - [Cache Test - First Run] Query: 'What happened in the latest Premier League matches?' - [Cache Test - First Run] Vector search completed in 0.6450 seconds - [Cache Test - First Run] Found 4 relevant documents - [Cache Test - First Run] Top result preview: Who has made Troy's Premier League team of the week? - - After every round of Premier League matches th... - - Second execution (cache hit - should be faster): - - [Cache Test - Second Run] Testing vector search performance - [Cache Test - Second Run] Query: 'What happened in the latest Premier League matches?' - [Cache Test - Second Run] Vector search completed in 0.4306 seconds - [Cache Test - Second Run] Found 4 relevant documents - [Cache Test - Second Run] Top result preview: Who has made Troy's Premier League team of the week? - - After every round of Premier League matches th... - - -### Vector Search Performance Analysis - -Let's analyze the vector search performance improvements across all optimization levels: - - -```python -print("\n" + "="*80) -print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY") -print("="*80) - -print(f"Phase 1 - Baseline Search (No GSI): {baseline_time:.4f} seconds") -print(f"Phase 2 - GSI-Optimized Search: {gsi_search_time:.4f} seconds") -if cache_time_1 and cache_time_2: - print(f"Phase 3 - Cache Benefits:") - print(f" First execution (cache miss): {cache_time_1:.4f} seconds") - print(f" Second execution (cache hit): {cache_time_2:.4f} seconds") - -print("\n" + "-"*80) -print("VECTOR SEARCH OPTIMIZATION IMPACT:") -print("-"*80) - -# GSI improvement analysis -if baseline_time and gsi_search_time: - speedup = baseline_time / gsi_search_time if gsi_search_time > 0 else float('inf') - time_saved = baseline_time - gsi_search_time - percent_improvement = (time_saved / baseline_time) * 100 - print(f"GSI Index Benefit: {speedup:.2f}x faster ({percent_improvement:.1f}% improvement)") - -# Cache improvement analysis -if cache_time_1 and cache_time_2 and cache_time_2 < cache_time_1: - cache_speedup = cache_time_1 / cache_time_2 - cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100 - print(f"Cache Benefit: {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)") -else: - print(f"Cache Benefit: Variable (depends on query complexity and caching mechanism)") - -print(f"\nKey Insights for Vector Search Performance:") -print(f"• GSI BHIVE indexes provide significant performance improvements for vector similarity search") -print(f"• Performance gains are most dramatic for complex semantic queries") -print(f"• BHIVE optimization is particularly effective for high-dimensional embeddings") -print(f"• Combined with proper quantization (SQ8), GSI delivers production-ready performance") -print(f"• These performance improvements directly benefit any application using the vector store") -``` - - - ================================================================================ - VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY - ================================================================================ - Phase 1 - Baseline Search (No GSI): 1.3999 seconds - Phase 2 - GSI-Optimized Search: 0.5885 seconds - Phase 3 - Cache Benefits: - First execution (cache miss): 0.6450 seconds - Second execution (cache hit): 0.4306 seconds - - -------------------------------------------------------------------------------- - VECTOR SEARCH OPTIMIZATION IMPACT: - -------------------------------------------------------------------------------- - GSI Index Benefit: 2.38x faster (58.0% improvement) - Cache Benefit: 1.50x faster (33.2% improvement) - - Key Insights for Vector Search Performance: - • GSI BHIVE indexes provide significant performance improvements for vector similarity search - • Performance gains are most dramatic for complex semantic queries - • BHIVE optimization is particularly effective for high-dimensional embeddings - • Combined with proper quantization (SQ8), GSI delivers production-ready performance - • These performance improvements directly benefit any application using the vector store - - -## CrewAI Agent Setup - -### What is CrewAI? - -Now that we've optimized our vector search performance, let's build a sophisticated agent-based RAG system using CrewAI. CrewAI enables us to create specialized AI agents that collaborate to handle different aspects of the RAG workflow: - -- **Research Agent**: Finds and analyzes relevant documents using our optimized vector search -- **Writer Agent**: Takes research findings and creates polished, structured responses -- **Collaborative Workflow**: Agents work together, with the writer building on the researcher's findings - -This multi-agent approach produces higher-quality responses than single-agent systems by separating research and writing expertise, while benefiting from the GSI performance improvements we just demonstrated. - -### Create Vector Search Tool - - -```python -# Define the GSI vector search tool using the @tool decorator -@tool("gsi_vector_search") -def search_tool(query: str) -> str: - """Search for relevant documents using GSI vector similarity. - Input should be a simple text query string. - Returns a list of relevant document contents from GSI vector search. - Use this tool to find detailed information about topics using high-performance GSI indexes.""" - - # Invoke the GSI vector retriever (now optimized with BHIVE index) - docs = retriever.invoke(query) - - # Format the results with distance information - formatted_docs = "\n\n".join([ - f"Document {i+1}:\n{'-'*40}\n{doc.page_content}" - for i, doc in enumerate(docs) - ]) - return formatted_docs -``` - -### Create CrewAI Agents - - -```python -# Create research agent -researcher = Agent( - role='Research Expert', - goal='Find and analyze the most relevant documents to answer user queries accurately', - backstory="""You are an expert researcher with deep knowledge in information retrieval - and analysis. Your expertise lies in finding, evaluating, and synthesizing information - from various sources. You have a keen eye for detail and can identify key insights - from complex documents. You always verify information across multiple sources and - provide comprehensive, accurate analyses.""", - tools=[search_tool], - llm=llm, - verbose=False, - memory=True, - allow_delegation=False -) - -# Create writer agent -writer = Agent( - role='Technical Writer', - goal='Generate clear, accurate, and well-structured responses based on research findings', - backstory="""You are a skilled technical writer with expertise in making complex - information accessible and engaging. You excel at organizing information logically, - explaining technical concepts clearly, and creating well-structured documents. You - ensure all information is properly cited, accurate, and presented in a user-friendly - manner. You have a talent for maintaining the reader's interest while conveying - detailed technical information.""", - llm=llm, - verbose=False, - memory=True, - allow_delegation=False -) - -print("CrewAI agents created successfully with optimized GSI vector search") -``` - - CrewAI agents created successfully with optimized GSI vector search - - -### How the Optimized RAG Workflow Works - -The complete optimized RAG process: -1. **User Query** → Research Agent -2. **Vector Search** → GSI BHIVE index finds similar documents (now with proven performance improvements) -3. **Document Analysis** → Research Agent analyzes and synthesizes findings -4. **Response Writing** → Writer Agent creates polished, structured response -5. **Final Output** → User receives comprehensive, well-formatted answer - -**Key Benefit**: The vector search performance improvements we demonstrated directly enhance the agent workflow efficiency. - -## CrewAI Agent Demo - -Now let's demonstrate the complete optimized agent-based RAG system in action, benefiting from the GSI performance improvements we validated earlier. - -### Demo Function - - -```python -def process_interactive_query(query, researcher, writer): - """Run complete RAG workflow with CrewAI agents using optimized GSI vector search""" - print(f"\nProcessing Query: {query}") - print("=" * 80) - - # Create tasks - research_task = Task( - description=f"Research and analyze information relevant to: {query}", - agent=researcher, - expected_output="A detailed analysis with key findings" - ) - - writing_task = Task( - description="Create a comprehensive response", - agent=writer, - expected_output="A clear, well-structured answer", - context=[research_task] - ) - - # Execute crew - crew = Crew( - agents=[researcher, writer], - tasks=[research_task, writing_task], - process=Process.sequential, - verbose=True, - cache=True, - planning=True - ) - - try: - start_time = time.time() - result = crew.kickoff() - elapsed_time = time.time() - start_time - - print(f"\nCompleted in {elapsed_time:.2f} seconds") - print("=" * 80) - print("RESPONSE") - print("=" * 80) - print(result) - - return elapsed_time - except Exception as e: - print(f"Error: {str(e)}") - return None -``` - -### Run Agent-Based RAG Demo - - -```python -# Disable logging for cleaner output -logging.disable(logging.CRITICAL) - -# Run demo with a sample query -demo_query = "What are the key details about the FA Cup third round draw?" -final_time = process_interactive_query(demo_query, researcher, writer) - -if final_time: - print(f"\n\n✅ CrewAI agent demo completed successfully in {final_time:.2f} seconds") -``` - -## Conclusion - -You have successfully built a powerful agent-based RAG system that combines Couchbase's high-performance GSI vector storage capabilities with CrewAI's multi-agent architecture. This tutorial demonstrated the complete pipeline from data ingestion to intelligent response generation, with real performance benchmarks showing the dramatic improvements GSI indexing provides. diff --git a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Jina_AI.md b/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Jina_AI.md deleted file mode 100644 index 224f57b..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Jina_AI.md +++ /dev/null @@ -1,926 +0,0 @@ ---- -# frontmatter -path: "/tutorial-jina-couchbase-rag-with-global-secondary-index" -title: Retrieval-Augmented Generation (RAG) with Couchbase and Jina AI using GSI -short_title: RAG with Couchbase and Jina -description: - - Learn how to build a semantic search engine using Couchbase and Jina. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Jina embeddings and language models. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase using GSI. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - Jina AI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/jinaai/gsi/RAG_with_Couchbase_and_Jina_AI.ipynb) - -# Semantic Search with Couchbase GSI Vector Indexes and Jina AI - -## Overview - -This tutorial demonstrates building a high-performance semantic search engine using Couchbase's GSI (Global Secondary Index) vector search and Jina AI for embeddings and language models. We'll show measurable performance improvements with GSI optimization and implement a complete RAG (Retrieval-Augmented Generation) system. - -**Key Features:** -- High-performance GSI vector search with BHIVE indexing -- Jina AI embeddings and language models -- Performance benchmarks showing GSI benefits -- Complete RAG workflow with caching optimization - -**Requirements:** Couchbase Server 8.0+ or Capella with Query Service enabled. - -## How to Run This Tutorial - -This tutorial is available as a Jupyter Notebook that you can run interactively on [Google Colab](https://colab.research.google.com/) or locally by setting up the Python environment. - -## Prerequisites - -### System Requirements - -- **Couchbase Server 8.0+** or Couchbase Capella -- **Query Service enabled** (required for GSI Vector Indexes) -- **Jina AI API credentials** ([Get them here](https://jina.ai/)) -- **JinaChat API credentials** ([Get them here](https://chat.jina.ai/api)) - -### Couchbase Capella Setup - -1. **Create Account:** Deploy a [free tier cluster](https://cloud.couchbase.com/sign-up) -2. **Configure Access:** Set up database credentials and network security -3. **Enable Query Service:** Required for GSI vector search functionality - -## Setup and Installation - -### Install Required Libraries - -Install the necessary packages for Couchbase GSI vector search, Jina AI integration, and LangChain RAG capabilities. - - -```python -# Jina doesnt support openai other than 0.27 -%pip install --quiet datasets==3.6.0 langchain-couchbase==0.5.0 langchain-community==0.3.24 openai==0.27 python-dotenv==1.1.0 ipywidgets -``` - - Note: you may need to restart the kernel to use updated packages. - - -### Import Required Modules - -Import libraries for Couchbase GSI vector search, Jina AI models, and LangChain components. - - -```python -import logging -import os -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_community.chat_models import JinaChat -from langchain_community.embeddings import JinaEmbeddings -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.prompts.chat import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_couchbase.vectorstores import IndexType -``` - -### Configure Logging - -Set up logging to track progress and capture any errors during execution. - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True) - -# Suppress all logs from specific loggers -logging.getLogger('openai').setLevel(logging.WARNING) -logging.getLogger('httpx').setLevel(logging.WARNING) -``` - -### Environment Configuration - -Load environment variables for secure access to Jina AI and Couchbase services. Create a `.env` file with your credentials. - - -```python -load_dotenv("./.env") - -JINA_API_KEY = os.getenv("JINA_API_KEY") -JINACHAT_API_KEY = os.getenv("JINACHAT_API_KEY") - -CB_HOST = os.getenv("CB_HOST") or 'couchbase://localhost' -CB_USERNAME = os.getenv("CB_USERNAME") or 'Administrator' -CB_PASSWORD = os.getenv("CB_PASSWORD") or 'password' -CB_BUCKET_NAME = os.getenv("CB_BUCKET_NAME") or 'vector-search-testing' -INDEX_NAME = os.getenv("INDEX_NAME") or 'vector_search_jina' - -SCOPE_NAME = os.getenv("SCOPE_NAME") or 'shared' -COLLECTION_NAME = os.getenv("COLLECTION_NAME") or 'jina' -CACHE_COLLECTION = os.getenv("CACHE_COLLECTION") or 'cache' - -# Check if the variables are correctly loaded -if not JINA_API_KEY: - raise ValueError("JINA_API_KEY environment variable is not set") -if not JINACHAT_API_KEY: - raise ValueError("JINACHAT_API_KEY environment variable is not set") -``` - -## Couchbase Connection Setup - -### Connect to Cluster - -Establish connection to Couchbase cluster for vector storage and retrieval operations. - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-10-08 11:18:34,736 - INFO - Successfully connected to Couchbase - - -### Setup Collections - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) -``` - - 2025-10-08 11:18:36,208 - INFO - Bucket 'vector-search-testing' exists. - 2025-10-08 11:18:36,219 - INFO - Collection 'jina' already exists. Skipping creation. - 2025-10-08 11:18:38,322 - INFO - All documents cleared from the collection. - 2025-10-08 11:18:38,322 - INFO - Bucket 'vector-search-testing' exists. - 2025-10-08 11:18:38,327 - INFO - Collection 'jina_cache' already exists. Skipping creation. - 2025-10-08 11:18:40,480 - INFO - All documents cleared from the collection. - - - - - - - - - -## Document Processing and Vector Store - -### Create Jina Embeddings - -Set up Jina AI embeddings to convert text into high-dimensional vectors that capture semantic meaning for similarity search. - - -```python -try: - embeddings = JinaEmbeddings( - jina_api_key=JINA_API_KEY, model_name="jina-embeddings-v3" - ) - logging.info("Successfully created JinaEmbeddings") -except Exception as e: - raise ValueError(f"Error creating JinaEmbeddings: {str(e)}") -``` - - 2025-10-08 11:18:56,191 - INFO - Successfully created JinaEmbeddings - - -### Create GSI Vector Store - -Set up the GSI vector store for high-performance vector storage and similarity search using Couchbase's Query Service. - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created GSI vector store") -except Exception as e: - raise ValueError(f"Failed to create GSI vector store: {str(e)}") -``` - - 2025-10-08 11:18:57,341 - INFO - Successfully created GSI vector store - - -### Index Creation Timing - -**Important**: GSI Vector Indexes must be created AFTER uploading vector data. The index creation process analyzes existing vectors to optimize search performance through clustering and quantization. - -### Load Dataset - -Load the BBC News dataset for real-world testing data with authentic news articles covering various topics. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-10-08 11:19:03,903 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -#### Clean Data - -Remove duplicate articles to ensure clean search results. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -#### Store Data - -Process articles in batches and store them in the vector database with embeddings. We'll use 60% of the dataset for faster processing while maintaining good search quality. - - -```python -# Calculate 60% of the dataset size and round to nearest integer -dataset_size = len(unique_news_articles) -subset_size = round(dataset_size * 0.6) - -# Filter articles by length and create subset -filtered_articles = [article for article in unique_news_articles[:subset_size] - if article and len(article) <= 50000] - -# Process in batches -batch_size = 50 - -try: - vector_store.add_texts( - texts=filtered_articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully") - -except CouchbaseException as e: - logging.error(f"Couchbase error during ingestion: {str(e)}") - raise RuntimeError(f"Error performing document ingestion: {str(e)}") -except Exception as e: - if "Payment Required" in str(e): - logging.error("Payment required for Jina AI API. Please check your subscription status and API key.") - print("To resolve this error:") - print("1. Visit 'https://jina.ai/reader/#pricing' to review subscription options") - print("2. Ensure your API key is valid and has sufficient credits") - print("3. Consider upgrading your subscription plan if needed") - else: - logging.error(f"Unexpected error during ingestion: {str(e)}") - raise RuntimeError(f"Failed to save documents to vector store: {str(e)}") -``` - - 2025-10-08 11:20:18,363 - INFO - Document ingestion completed successfully - - -## Vector Search Performance Testing - -Now let's demonstrate the performance benefits of GSI optimization by testing pure vector search performance. We'll compare three optimization levels: - -1. **Baseline Performance**: Vector search without GSI optimization -2. **GSI-Optimized Performance**: Same search with BHIVE GSI index -3. **Cache Benefits**: Show how caching can be applied on top of GSI for repeated queries - -### GSI Vector Index Types Overview - -Before we start testing, let's understand the index types available: - -**Hyperscale Vector Indexes (BHIVE):** -- **Best for**: Pure vector searches - content discovery, recommendations, semantic search -- **Performance**: High performance with low memory footprint, designed to scale to billions of vectors -- **Optimization**: Optimized for concurrent operations, supports simultaneous searches and inserts -- **Use when**: You primarily perform vector-only queries without complex scalar filtering -- **Ideal for**: Large-scale semantic search, recommendation systems, content discovery - -**Composite Vector Indexes:** -- **Best for**: Filtered vector searches that combine vector search with scalar value filtering -- **Performance**: Efficient pre-filtering where scalar attributes reduce the vector comparison scope -- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- **Ideal for**: Compliance-based filtering, user-specific searches, time-bounded queries -- **Note**: Scalar filters take precedence over vector similarity - -**Choosing the Right Index Type:** -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For this tutorial, we'll use **BHIVE** as it's optimized for pure semantic search scenarios. - -### Index Configuration Details - -The `index_description` parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -**Format**: `'IVF[],{PQ|SQ}'` - -#### **IVF (Inverted File Index) - Centroids Configuration** -- **Purpose**: Controls how the dataset is subdivided into clusters for faster searches -- **Trade-offs**: More centroids = faster searches but slower training time -- **Auto-selection**: If omitted (e.g., `IVF,SQ8`), Couchbase automatically selects the optimal number based on dataset size -- **Manual setting**: Specify exact count (e.g., `IVF1000,SQ8` for 1000 centroids) - -#### **Quantization Options - Vector Compression** - -**SQ (Scalar Quantization)** -- **Purpose**: Compresses vectors by reducing precision of individual components -- **Settings**: `SQ4`, `SQ6`, `SQ8` (4-bit, 6-bit, 8-bit precision) -- **Trade-off**: Lower bits = more compression but less precision -- **Best for**: General-purpose applications where some precision loss is acceptable - -**PQ (Product Quantization)** -- **Purpose**: Advanced compression using subquantizers for better precision -- **Format**: `PQx` (e.g., `PQ32x8` = 32 subquantizers of 8 bits each) -- **Trade-off**: More complex but often better precision than SQ at similar compression ratios -- **Best for**: Applications requiring high precision with significant compression - -#### **Common Configuration Examples** - -``` -IVF,SQ8 # Auto-selected centroids with 8-bit scalar quantization (recommended default) -IVF1000,SQ6 # 1000 centroids with 6-bit scalar quantization (higher compression) -IVF,PQ32x8 # Auto-selected centroids with 32 subquantizers of 8 bits each -IVF500,PQ16x4 # 500 centroids with 16 subquantizers of 4 bits each (high compression) -``` - -#### **Performance Considerations** - -**Distance Interpretation**: In GSI vector search, lower distance values indicate higher similarity, while higher distance values indicate lower similarity. - -**Scalability**: BHIVE indexes can scale to billions of vectors with optimized concurrent operations, making them suitable for large-scale production deployments. - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - -### Vector Search Test Function - - -```python -def test_vector_search_performance(vector_store, query, label="Vector Search"): - """Test pure vector search performance and return timing metrics""" - print(f"\n[{label}] Testing vector search performance") - print(f"[{label}] Query: '{query}'") - - start_time = time.time() - - try: - results = vector_store.similarity_search_with_score(query, k=3) - end_time = time.time() - search_time = end_time - start_time - - print(f"[{label}] Vector search completed in {search_time:.4f} seconds") - print(f"[{label}] Found {len(results)} documents") - - if results: - doc, distance = results[0] - print(f"[{label}] Top result distance: {distance:.6f} (lower = more similar)") - preview = doc.page_content[:100] + "..." if len(doc.page_content) > 100 else doc.page_content - print(f"[{label}] Top result preview: {preview}") - - return search_time - except Exception as e: - print(f"[{label}] Vector search failed: {str(e)}") - return None -``` - -### Test 1: Baseline Performance (No GSI Index) - -Test pure vector search performance without GSI optimization. - - -```python -# Test baseline vector search performance without GSI index -test_query = "What was manchester city manager pep guardiola's reaction to the team's current form?" -print("Testing baseline vector search performance without GSI optimization...") -baseline_time = test_vector_search_performance(vector_store, test_query, "Baseline Search") -print(f"\nBaseline vector search time (without GSI): {baseline_time:.4f} seconds\n") -``` - - Testing baseline vector search performance without GSI optimization... - - [Baseline Search] Testing vector search performance - [Baseline Search] Query: 'What was manchester city manager pep guardiola's reaction to the team's current form?' - [Baseline Search] Vector search completed in 0.8305 seconds - [Baseline Search] Found 3 documents - [Baseline Search] Top result distance: 0.457932 (lower = more similar) - [Baseline Search] Top result preview: 'Promised change, but Juventus are back in crisis' - - "We have entirely changed the way we think about... - - Baseline vector search time (without GSI): 0.8305 seconds - - - -### Create BHIVE GSI Index - -Now let's create a BHIVE GSI vector index to enable high-performance vector searches. The index creation is done programmatically through the vector store. - - -```python -# Create GSI Vector Index for high-performance searches -print("Creating BHIVE GSI vector index...") -try: - vector_store.create_index( - index_type=IndexType.BHIVE, # Use IndexType.COMPOSITE for Composite index - index_description="IVF,SQ8" - ) - print("GSI Vector index created successfully") - - # Wait for index to become available - print("Waiting for index to become available...") - time.sleep(5) - -except Exception as e: - if "already exists" in str(e).lower(): - print("GSI Vector index already exists, proceeding...") - else: - print(f"Error creating GSI index: {str(e)}") -``` - - Creating BHIVE GSI vector index... - GSI Vector index created successfully - Waiting for index to become available... - - -### Alternative: Composite Index Configuration - -If your use case requires complex filtering with scalar attributes, you can create a **Composite index** instead by changing the configuration above: - -```python -# Alternative: Create a Composite index for filtered searches -vector_store.create_index( - index_type=IndexType.COMPOSITE, # Instead of IndexType.BHIVE - index_description="IVF,SQ8" # Same quantization settings -) -``` - -### Test 2: GSI-Optimized Performance - -Test the same vector search with BHIVE GSI optimization. - - -```python -# Test vector search performance with GSI index -gsi_test_query = "What happened in the latest Premier League matches?" -print("Testing vector search performance with BHIVE GSI optimization...") -gsi_time = test_vector_search_performance(vector_store, gsi_test_query, "GSI-Optimized Search") -``` - - Testing vector search performance with BHIVE GSI optimization... - - [GSI-Optimized Search] Testing vector search performance - [GSI-Optimized Search] Query: 'What happened in the latest Premier League matches?' - [GSI-Optimized Search] Vector search completed in 0.6452 seconds - [GSI-Optimized Search] Found 3 documents - [GSI-Optimized Search] Top result distance: 0.394714 (lower = more similar) - [GSI-Optimized Search] Top result preview: The latest updates and analysis from the BBC. - - -### Test 3: Cache Benefits Testing - -Now let's demonstrate how caching can improve performance for repeated queries. **Note**: Caching benefits apply to both baseline and GSI-optimized searches. - - -```python -# Set up Couchbase cache (can be applied to any search approach) -print("Setting up Couchbase cache for improved performance on repeated queries...") -cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, -) -set_llm_cache(cache) -print("✓ Couchbase cache enabled!") -``` - - Setting up Couchbase cache for improved performance on repeated queries... - ✓ Couchbase cache enabled! - - - -```python -# Test cache benefits with a different query to avoid interference -cache_test_query = "What are the latest football transfer developments?" - -print("Testing cache benefits with vector search...") -print("First execution (cache miss):") -cache_time_1 = test_vector_search_performance(vector_store, cache_test_query, "Cache Test - First Run") - -print("\nSecond execution (cache hit - should be faster):") -cache_time_2 = test_vector_search_performance(vector_store, cache_test_query, "Cache Test - Second Run") -``` - - Testing cache benefits with vector search... - First execution (cache miss): - - [Cache Test - First Run] Testing vector search performance - [Cache Test - First Run] Query: 'What are the latest football transfer developments?' - [Cache Test - First Run] Vector search completed in 0.9695 seconds - [Cache Test - First Run] Found 3 documents - [Cache Test - First Run] Top result distance: 0.394020 (lower = more similar) - [Cache Test - First Run] Top result preview: The latest updates and analysis from the BBC. - - Second execution (cache hit - should be faster): - - [Cache Test - Second Run] Testing vector search performance - [Cache Test - Second Run] Query: 'What are the latest football transfer developments?' - [Cache Test - Second Run] Vector search completed in 0.5252 seconds - [Cache Test - Second Run] Found 3 documents - [Cache Test - Second Run] Top result distance: 0.394020 (lower = more similar) - [Cache Test - Second Run] Top result preview: The latest updates and analysis from the BBC. - - -### Vector Search Performance Analysis - -Let's analyze the vector search performance improvements across all optimization levels: - - -```python -print("\n" + "="*80) -print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY") -print("="*80) - -print(f"Phase 1 - Baseline Search (No GSI): {baseline_time:.4f} seconds") -print(f"Phase 2 - GSI-Optimized Search: {gsi_time:.4f} seconds") -if cache_time_1 and cache_time_2: - print(f"Phase 3 - Cache Benefits:") - print(f" First execution (cache miss): {cache_time_1:.4f} seconds") - print(f" Second execution (cache hit): {cache_time_2:.4f} seconds") - -print("\n" + "-"*80) -print("VECTOR SEARCH OPTIMIZATION IMPACT:") -print("-"*80) - -# GSI improvement analysis -if baseline_time and gsi_time: - speedup = baseline_time / gsi_time if gsi_time > 0 else float('inf') - time_saved = baseline_time - gsi_time - percent_improvement = (time_saved / baseline_time) * 100 - print(f"GSI Index Benefit: {speedup:.2f}x faster ({percent_improvement:.1f}% improvement)") - -# Cache improvement analysis -if cache_time_1 and cache_time_2 and cache_time_2 < cache_time_1: - cache_speedup = cache_time_1 / cache_time_2 - cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100 - print(f"Cache Benefit: {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)") -else: - print(f"Cache Benefit: Variable (depends on query complexity and caching mechanism)") - -print(f"\nKey Insights for Vector Search Performance:") -print(f"• GSI BHIVE indexes provide significant performance improvements for vector similarity search") -print(f"• Performance gains are most dramatic for complex semantic queries") -print(f"• BHIVE optimization is particularly effective for high-dimensional embeddings") -print(f"• Combined with proper quantization (SQ8), GSI delivers production-ready performance") -print(f"• These performance improvements directly benefit any application using the vector store") -``` - - - ================================================================================ - VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY - ================================================================================ - Phase 1 - Baseline Search (No GSI): 0.8305 seconds - Phase 2 - GSI-Optimized Search: 0.6452 seconds - Phase 3 - Cache Benefits: - First execution (cache miss): 0.9695 seconds - Second execution (cache hit): 0.5252 seconds - - -------------------------------------------------------------------------------- - VECTOR SEARCH OPTIMIZATION IMPACT: - -------------------------------------------------------------------------------- - GSI Index Benefit: 1.29x faster (22.3% improvement) - Cache Benefit: 1.85x faster (45.8% improvement) - - Key Insights for Vector Search Performance: - • GSI BHIVE indexes provide significant performance improvements for vector similarity search - • Performance gains are most dramatic for complex semantic queries - • BHIVE optimization is particularly effective for high-dimensional embeddings - • Combined with proper quantization (SQ8), GSI delivers production-ready performance - • These performance improvements directly benefit any application using the vector store - - -## Jina AI RAG Demo - -### What is RAG (Retrieval-Augmented Generation)? - -Now that we've optimized our vector search performance, let's demonstrate how to build a complete RAG system using Jina AI. RAG combines the power of our GSI-optimized semantic search with language model generation: - -1. **Query Processing**: User question is converted to vector embedding using Jina AI -2. **Document Retrieval**: GSI BHIVE index finds most relevant documents (now with proven performance improvements) -3. **Context Assembly**: Retrieved documents provide factual context for the language model -4. **Response Generation**: Jina's language model generates intelligent answers grounded in the retrieved data - -This demo shows how the vector search performance improvements we validated directly enhance the RAG workflow efficiency. - -### Create Jina Language Model - -Initialize Jina's chat model for generating intelligent responses based on our GSI-optimized retrieval system. - - -```python -print("Setting up Jina AI language model for RAG demo...") -try: - llm = JinaChat(temperature=0.1, jinachat_api_key=JINACHAT_API_KEY) - print("✓ JinaChat language model created successfully") - logging.info("Successfully created JinaChat") -except Exception as e: - print(f"✗ Error creating JinaChat: {str(e)}") - print("Please check your JINACHAT_API_KEY and network connection.") - raise -``` - - 2025-10-08 11:24:30,099 - INFO - Successfully created JinaChat - - - Setting up Jina AI language model for RAG demo... - ✓ JinaChat language model created successfully - - -### Build Optimized RAG Pipeline - -Create the complete RAG pipeline that integrates our GSI-optimized vector search with Jina's language model. - - -```python -try: - # Create RAG prompt template for structured responses - template = """You are a helpful assistant that answers questions based on the provided context. - If you cannot answer based on the context provided, respond with a generic answer. - Answer the question as truthfully as possible using the context below: - - Context: - {context} - - Question: {question} - - Answer:""" - - prompt = ChatPromptTemplate.from_template(template) - - # Build the RAG chain: GSI-Optimized Retrieval → Context → Generation → Output - rag_chain = ( - { - "context": vector_store.as_retriever(search_kwargs={"k": 2}), - "question": RunnablePassthrough() - } - | prompt - | llm - | StrOutputParser() - ) - print("Optimized RAG pipeline created successfully") - print("Components: GSI BHIVE Vector Search → Context Assembly → Jina Language Model → Response") -except Exception as e: - raise ValueError(f"Error creating RAG pipeline: {str(e)}") -``` - - Optimized RAG pipeline created successfully - Components: GSI BHIVE Vector Search → Context Assembly → Jina Language Model → Response - - -### RAG Demo with Optimized Search - -Test the complete RAG system leveraging our GSI performance optimizations. - - -```python -print("Testing RAG System with GSI-Optimized Vector Search") -print("=" * 60) - -try: - # Test with a specific query - sample_query = "What are the new eligibility rules for transgender women competing in leading women's golf tours, and what prompted these changes?" - print(f"User Query: {sample_query}") - print("\nProcessing with optimized pipeline...") - print("1. Converting query to vector embedding with Jina AI") - print("2. Searching GSI BHIVE index for relevant documents (optimized)") - print("3. Assembling context from retrieved documents") - print("4. Generating intelligent response with JinaChat") - - start_time = time.time() - rag_response = rag_chain.invoke(sample_query) - end_time = time.time() - - print(f"\nRAG Response (completed in {end_time - start_time:.2f} seconds):") - print("-" * 60) - print(rag_response) - -except Exception as e: - if "Payment Required" in str(e): - print("\nPayment required for Jina AI API.") - print("To resolve:") - print("• Visit https://jina.ai/reader/#pricing for subscription options") - print("• Ensure your API key is valid and has sufficient credits") - else: - print(f"Error: {str(e)}") -``` - - Testing RAG System with GSI-Optimized Vector Search - ============================================================ - User Query: What are the new eligibility rules for transgender women competing in leading women's golf tours, and what prompted these changes? - - Processing with optimized pipeline... - 1. Converting query to vector embedding with Jina AI - 2. Searching GSI BHIVE index for relevant documents (optimized) - 3. Assembling context from retrieved documents - 4. Generating intelligent response with JinaChat - - RAG Response (completed in 4.25 seconds): - ------------------------------------------------------------ - The new eligibility rules for transgender women competing in leading women's golf tours starting from 2025 prevent transgender women who have gone through male puberty from participating. Female players protesting led to these changes, as they called for policies to prevent those recorded as male at birth from competing in women's events. - - -### Multiple Query RAG Demo - -Test the RAG system with various queries to demonstrate the benefits of our optimized vector search. - - -```python -print("\nTesting Optimized RAG System with Multiple Queries") -print("=" * 55) - -try: - test_queries = [ - "What happened in the car incident on Shaftesbury Avenue in London?", - "What did King Charles talk about in his recent Christmas speech?", - ] - - for i, query in enumerate(test_queries, 1): - print(f"\n--- RAG Query {i} ---") - print(f"Question: {query}") - - start_time = time.time() - response = rag_chain.invoke(query) - end_time = time.time() - - print(f"Response (completed in {end_time - start_time:.2f} seconds): {response}") - -except Exception as e: - if "Payment Required" in str(e): - print("Payment required for Jina AI API.") - else: - print(f"Error: {str(e)}") - -print(f"\n✅ RAG demo completed successfully!") -print("✅ The system leverages GSI BHIVE optimization for fast document retrieval!") -print("✅ Jina AI provides high-quality embeddings and intelligent response generation!") -``` - - - Testing Optimized RAG System with Multiple Queries - ======================================================= - - --- RAG Query 1 --- - Question: What happened in the car incident on Shaftesbury Avenue in London? - Response (completed in 3.32 seconds): ### Answer: - A 31-year-old man was arrested on suspicion of attempted murder after driving a car on the wrong side of the road in Shaftesbury Avenue, London, injuring four pedestrians. The incident was treated as an isolated incident and was not terror-related. - - --- RAG Query 2 --- - Question: What did King Charles talk about in his recent Christmas speech? - Response (completed in 0.74 seconds): ### King Charles's Recent Christmas Speech Highlights: - - - Visited a Christmas market at Battersea Power Station. - - Met with Apple chief Tim Cook at Apple's UK headquarters. - - Interacted with carol singers, Christmas shoppers, and stallholders. - - Explored the power station and visited stalls at the Curated Makers Market. - - ✅ RAG demo completed successfully! - ✅ The system leverages GSI BHIVE optimization for fast document retrieval! - ✅ Jina AI provides high-quality embeddings and intelligent response generation! - - -## Conclusion - -You've successfully built a high-performance semantic search engine combining: -- **Couchbase GSI BHIVE indexes** for optimized vector search -- **Jina AI embeddings and language models** for intelligent processing -- **Complete RAG pipeline** with caching optimization diff --git a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Openrouter_Deepseek.md b/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Openrouter_Deepseek.md deleted file mode 100644 index 2512f9f..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Openrouter_Deepseek.md +++ /dev/null @@ -1,815 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openrouter-deepseek-with-global-secondary-index" -title: Retrieval-Augmented Generation with Couchbase and OpenRouter Deepseek using GSI index -short_title: RAG with Couchbase and OpenRouter Deepseek using GSI index -description: - - Learn how to build a semantic search engine using Couchbase and OpenRouter with Deepseek using GSI index. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with OpenRouter Deepseek as both embeddings and language model provider. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - Deepseek - - OpenRouter -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/openrouter-deepseek/gsi/RAG_with_Couchbase_and_Openrouter_Deepseek.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Deepseek V3 as the language model provider (via OpenRouter or direct API)](https://deepseek.ai/) and OpenAI for embeddings. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system using GSI( Global Secondary Index) from scratch. Alternatively if you want to perform semantic search using the FTS index, please take a look at [this.](https://developer.couchbase.com/tutorial-openrouter-deepseek-with-fts/) - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/openrouter-deepseek/RAG_with_Couchbase_and_Openrouter_Deepseek.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for OpenRouter and Deepseek -* Sign up for an account at [OpenRouter](https://openrouter.ai/) to get your API key -* OpenRouter provides access to Deepseek models, so no separate Deepseek credentials are needed -* Store your OpenRouter API key securely as it will be used to access the models -* For [Deepseek](https://deepseek.ai/) models, you can use the default models provided by OpenRouter - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -Note: To run this this tutorial, you will need Capella with Couchbase Server version 8.0 or above as GSI vector search is supported only from version 8.0 - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -## Setting the Stage: Installing Necessary Libraries - -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.5.0 langchain-deepseek==0.1.3 langchain-openai==0.3.13 python-dotenv==1.1.1 -``` - - Note: you may need to restart the kernel to use updated packages. - - -## Importing Necessary Libraries - -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException,ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts.chat import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_couchbase.vectorstores import IndexType -from langchain_openai import OpenAIEmbeddings -``` - -## Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) - -# Suppress httpx logging -logging.getLogger('httpx').setLevel(logging.CRITICAL) -``` - -## Environment Variables and Configuration - -This section handles loading and validating environment variables and configuration settings: -# -1. API Keys: - - Supports either direct Deepseek API or OpenRouter API access - - Prompts for API key input if not found in environment - - Requires OpenAI API key for embeddings -# -2. Couchbase Settings: - - Connection details (host, username, password) - - Bucket, scope and collection names - - Vector search index configuration - - Cache collection settings -# -The code validates that all required credentials are present before proceeding. -It allows flexible configuration through environment variables or interactive prompts, -with sensible defaults for local development. - - - -```python -# Load environment variables from .env file if it exists -load_dotenv(override= True) - -# API Keys -# Allow either Deepseek API directly or via OpenRouter -DEEPSEEK_API_KEY = os.getenv('DEEPSEEK_API_KEY') -OPENROUTER_API_KEY = os.getenv('OPENROUTER_API_KEY') - -if not DEEPSEEK_API_KEY and not OPENROUTER_API_KEY: - api_choice = input('Choose API (1 for Deepseek direct, 2 for OpenRouter): ') - if api_choice == '1': - DEEPSEEK_API_KEY = getpass.getpass('Enter your Deepseek API Key: ') - else: - OPENROUTER_API_KEY = getpass.getpass('Enter your OpenRouter API Key: ') - -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API Key: ') - -# Couchbase Settings -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: query-vector-search-testing): ') or 'query-vector-search-testing' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: deepseek): ') or 'deepseek' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache' - -# Check if required credentials are set -required_creds = { - 'OPENAI_API_KEY': OPENAI_API_KEY, - 'CB_HOST': CB_HOST, - 'CB_USERNAME': CB_USERNAME, - 'CB_PASSWORD': CB_PASSWORD, - 'CB_BUCKET_NAME': CB_BUCKET_NAME -} - -# Add the API key that was chosen -if DEEPSEEK_API_KEY: - required_creds['DEEPSEEK_API_KEY'] = DEEPSEEK_API_KEY -elif OPENROUTER_API_KEY: - required_creds['OPENROUTER_API_KEY'] = OPENROUTER_API_KEY -else: - raise ValueError("Either Deepseek API Key or OpenRouter API Key must be provided") - -for cred_name, cred_value in required_creds.items(): - if not cred_value: - raise ValueError(f"{cred_name} is not set") -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-09-17 15:40:27,133 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: If you are using Capella, create a bucket manually called vector-search-testing(or any name you prefer) with the same properties. - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-09-17 15:41:01,398 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-09-17 15:41:01,410 - INFO - Collection 'deepseek' does not exist. Creating it... - 2025-09-17 15:41:01,453 - INFO - Collection 'deepseek' created successfully. - 2025-09-17 15:41:03,712 - INFO - All documents cleared from the collection. - 2025-09-17 15:41:03,713 - INFO - Bucket 'query-vector-search-testing' exists. - 2025-09-17 15:41:03,728 - INFO - Collection 'cache' already exists. Skipping creation. - 2025-09-17 15:41:05,821 - INFO - All documents cleared from the collection. - - - - - - - - - -## Creating the Embeddings client -This section creates an OpenAI embeddings client using the OpenAI API key. -The embeddings client is configured to use the "text-embedding-3-small" model, -which converts text into numerical vector representations. -These vector embeddings are essential for semantic search and similarity matching. -The client will be used by the vector store to generate embeddings for documents. - - -```python -try: - embeddings = OpenAIEmbeddings( - api_key=OPENAI_API_KEY, - model="text-embedding-3-small" - ) - logging.info("Successfully created OpenAI embeddings client") -except Exception as e: - raise ValueError(f"Error creating OpenAI embeddings client: {str(e)}") -``` - - 2025-09-17 15:41:27,149 - INFO - Successfully created OpenAI embeddings client - - -## Setting Up the Couchbase Vector Store -A vector store is where we'll keep our embeddings. Unlike the FTS index, which is used for text-based search, the vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the search engine converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables our search engine to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - - -```python -try: - vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding = embeddings, - distance_metric=DistanceStrategy.COSINE - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - - 2025-09-17 15:41:55,394 - INFO - Successfully created vector store - - -## Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-09-17 15:42:04,530 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Progress Tracking: Easier to monitor and track the ingestion progress -3. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 50 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 50 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") -``` - - 2025-09-17 16:08:51,054 - INFO - Document ingestion completed successfully. - - -## Setting Up the LLM Model -In this section, we set up the Large Language Model (LLM) for our RAG system. We're using the Deepseek model, which can be accessed through two different methods: - -1. **Deepseek API Key**: This is obtained directly from Deepseek's platform (https://deepseek.ai) by creating an account and subscribing to their API services. With this key, you can access Deepseek's models directly using the `ChatDeepSeek` class from the `langchain_deepseek` package. - -2. **OpenRouter API Key**: OpenRouter (https://openrouter.ai) is a service that provides unified access to multiple LLM providers, including Deepseek. You can obtain an API key by creating an account on OpenRouter's website. This approach uses the `ChatOpenAI` class from `langchain_openai` but with a custom base URL pointing to OpenRouter's API endpoint. - -The key difference is that OpenRouter acts as an intermediary service that can route your requests to various LLM providers, while the Deepseek API gives you direct access to only Deepseek's models. OpenRouter can be useful if you want to switch between different LLM providers without changing your code significantly. - -In our implementation, we check for both keys and prioritize using the Deepseek API directly if available, falling back to OpenRouter if not. The model is configured with temperature=0 to ensure deterministic, focused responses suitable for RAG applications. - - - -```python -from langchain_deepseek import ChatDeepSeek -from langchain_openai import ChatOpenAI - -if DEEPSEEK_API_KEY: - try: - llm = ChatDeepSeek( - api_key=DEEPSEEK_API_KEY, - model_name="deepseek-chat", - temperature=0 - ) - logging.info("Successfully created Deepseek LLM client") - except Exception as e: - raise ValueError(f"Error creating Deepseek LLM client: {str(e)}") -elif OPENROUTER_API_KEY: - try: - llm = ChatOpenAI( - api_key=OPENROUTER_API_KEY, - base_url="https://openrouter.ai/api/v1", - model="deepseek/deepseek-chat-v3.1", - temperature=0, - ) - logging.info("Successfully created Deepseek LLM client through OpenRouter") - except Exception as e: - raise ValueError(f"Error creating Deepseek LLM client: {str(e)}") -else: - raise ValueError("Either Deepseek API Key or OpenRouter API Key must be provided") -``` - - 2025-09-18 11:18:25,192 - INFO - Successfully created Deepseek LLM client through OpenRouter - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. Common metrics include cosine similarity, Euclidean distance, or dot product, but other metrics can be implemented based on specific use cases. Different embedding models like BERT, Word2Vec, or GloVe can also be used depending on the application's needs, with the vectors generated by these models stored and searched within Couchbase itself. - -In the provided code, the search process begins by recording the start time, followed by executing the `similarity_search_with_score` method of the `CouchbaseQueryVectorStore`. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and the distance that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) - - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-17 16:11:07,177 - INFO - Semantic search completed in 2.46 seconds - - - - Semantic Search Results (completed in 2.46 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3693, Text: The Littler effect - how darts hit the bullseye - - Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson. - - One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers. - - Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship. - - As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful." - - Littler beat Luke Humphries to win the Premier League title in May - - The number of academies for children under the age of 16 has doubled in the last year, says Junior Darts Corporation chairman Steve Brown. There are 115 dedicated groups offering youngsters equipment, tournaments and a place to develop, with bases including Australia, Bulgaria, Greece, Norway, USA and Mongolia. "We've seen so many inquiries from around the world, it's been such a boom. It took us 14 years to get 1,600 members and within 12 months we have over 3,000, and waiting lists," said Brown. "When I played darts as a child, I was quite embarrassed to tell my friends what my hobby was. All these kids playing darts now are pretty popular at school. It's a bit rock 'n roll and recognised as a cool thing to do." Plans are being hatched to extend the World Championship by four days and increase the number of players from 96 to 128. That will boost the number of tickets available by 25,000 to 115,000 but Hearn reckons he could sell three times as many. He says Saudi Arabia wants to host a tournament, which is likely to happen if no-alcohol regulations are relaxed. "They will change their rules in the next 12 months probably for certain areas having alcohol, and we'll take darts there and have a party in Saudi," he said. "When I got involved in darts, the total prize money was something like £300,000 for the year. This year it will go to £20m. I expect in five years' time, we'll be playing for £40m." - - Former electrician Cross charged to the 2018 world title in his first full season, while Adrian Lewis and Michael van Gerwen were multiple victors in their 20s and 16-time champion Phil ‘The Power’ Taylor is widely considered the greatest of all time. Littler is currently fourth in the world rankings, although that is based on a two-year Order of Merit. There have been suggestions from others the spotlight on the teenager means world number one Humphries, 29, has been denied the coverage he deserves, but no darts player has made a mark at such a young age as Littler. "Luke Humphries is another fabulous player who is going to be around for years. Sport is a very brutal world. It is about winning and claiming the high ground. There will be envy around," Hearn said. "Luke Littler is the next Tiger Woods for darts so they better get used to it, and the only way to compete is to get better." World number 38 Martin Lukeman was awestruck as he described facing a peak Littler after being crushed 16-3 in the Grand Slam final, with the teenager winning 15 consecutive legs. "I can't compete with that, it was like Godly. He was relentless, he is so good it's ridiculous," he said. Lukeman can still see the benefits he brings, adding: "What he's done for the sport is brilliant. If it wasn't for him, our wages wouldn't be going up. There's more sponsors, more money coming in, all good." Hearn feels future competition may come from players even younger than Littler. "I watched a 10-year-old a few months ago who averaged 104.89 and checked out a 4-3 win with a 136 finish. They smell the money, the fame and put the hard work in," he said. How much better Littler can get is guesswork, although Plummer believes he wants to reach new heights. "He never says 'how good was I?' But I think he wants to break records and beat Phil Taylor's 16 World Championships and 16 World Matchplay titles," he said. "He's young enough to do it." A version of this article was originally published on 29 November. - • None Know a lot about Littler? Take our quiz - -------------------------------------------------------------------------------- - Distance: 0.3900, Text: Luke Littler has risen from 164th to fourth in the rankings in a year - - A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97. - - Littler was hugged by his parents after victory over Meikle - - ... (output truncated for brevity) - - -# Optimizing Vector Search with Global Secondary Index (GSI) - -While the above semantic search using similarity_search_with_score works effectively, we can significantly improve query performance by leveraging Global Secondary Index (GSI) in Couchbase. - -Couchbase offers three types of vector indexes, but for GSI-based vector search we focus on two main types: - -Hyperscale Vector Indexes (BHIVE) -- Best for pure vector searches - content discovery, recommendations, semantic search -- High performance with low memory footprint - designed to scale to billions of vectors -- Optimized for concurrent operations - supports simultaneous searches and inserts -- Use when: You primarily perform vector-only queries without complex scalar filtering -- Ideal for: Large-scale semantic search, recommendation systems, content discovery - -Composite Vector Indexes -- Best for filtered vector searches - combines vector search with scalar value filtering -- Efficient pre-filtering - scalar attributes reduce the vector comparison scope -- Use when: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- Ideal for: Compliance-based filtering, user-specific searches, time-bounded queries - -Choosing the Right Index Type -- Start with Hyperscale Vector Index for pure vector searches and large datasets -- Use Composite Vector Index when scalar filters significantly reduce your search space -- Consider your dataset size: Hyperscale scales to billions, Composite works well for tens of millions to billions - -For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - - -## Understanding Index Configuration (Couchbase 8.0 Feature) - -The index_description parameter controls how Couchbase optimizes vector storage and search performance through centroids and quantization: - -Format: `'IVF[],{PQ|SQ}'` - -Centroids (IVF - Inverted File): -- Controls how the dataset is subdivided for faster searches -- More centroids = faster search, slower training -- Fewer centroids = slower search, faster training -- If omitted (like IVF,SQ8), Couchbase auto-selects based on dataset size - -Quantization Options: -- SQ (Scalar Quantization): SQ4, SQ6, SQ8 (4, 6, or 8 bits per dimension) -- PQ (Product Quantization): PQx (e.g., PQ32x8) -- Higher values = better accuracy, larger index size - -Common Examples: -- IVF,SQ8 - Auto centroids, 8-bit scalar quantization (good default) -- IVF1000,SQ6 - 1000 centroids, 6-bit scalar quantization -- IVF,PQ32x8 - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -In the code below, we demonstrate creating a BHIVE index. This method takes an index type (BHIVE or COMPOSITE) and description parameter for optimization settings. Alternatively, GSI indexes can be created manually from the Couchbase UI. - - -```python -vector_store.create_index(index_type=IndexType.BHIVE, index_name="openrouterdeepseek_bhive_index",index_description="IVF,SQ8") -``` - -The example below shows running the same similarity search, but now using the BHIVE GSI index we created above. You'll notice improved performance as the index efficiently retrieves data. - -**Important**: When using Composite indexes, scalar filters take precedence over vector similarity, which can improve performance for filtered searches but may miss some semantically relevant results that don't match the scalar criteria. - -Note: In GSI vector search, the distance represents the vector distance between the query and document embeddings. Lower distance indicate higher similarity, while higher distance indicate lower similarity. - - -```python - -query = "What were Luke Littler's key achievements and records in his recent PDC World Championship match?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) - - for doc, score in search_results: - print(f"Distance: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-09-18 11:17:19,626 - INFO - Semantic search completed in 0.88 seconds - - - - Semantic Search Results (completed in 0.88 seconds): - -------------------------------------------------------------------------------- - Distance: 0.3694, Text: The Littler effect - how darts hit the bullseye - - Teenager Luke Littler began his bid to win the 2025 PDC World Darts Championship with a second-round win against Ryan Meikle. Here we assess Littler's impact after a remarkable rise which saw him named BBC Young Sports Personality of the Year and runner-up in the main award to athlete Keely Hodgkinson. - - One year ago, he was barely a household name in his own home. Now he is a sporting phenomenon. After emerging from obscurity aged 16 to reach the World Championship final, the life of Luke Littler and the sport he loves has been transformed. Viewing figures, ticket sales and social media interest have rocketed. Darts has hit the bullseye. This Christmas more than 100,000 children are expected to be opening Littler-branded magnetic dartboards as presents. His impact has helped double the number of junior academies, prompted plans to expand the World Championship and generated interest in darts from Saudi Arabian backers. - - Just months after taking his GCSE exams and ranked 164th in the world, Littler beat former champions Raymond van Barneveld and Rob Cross en route to the PDC World Championship final in January, before his run ended with a 7-4 loss to Luke Humphries. With his nickname 'The Nuke' on his purple and yellow shirt and the Alexandra Palace crowd belting out his walk-on song, Pitbull's tune Greenlight, he became an instant hit. Electric on the stage, calm off it. The down-to-earth teenager celebrated with a kebab and computer games. "We've been watching his progress since he was about seven. He was on our radar, but we never anticipated what would happen. The next thing we know 'Littlermania' is spreading everywhere," PDC president Barry Hearn told BBC Sport. A peak TV audience of 3.7 million people watched the final - easily Sky's biggest figure for a non-football sporting event. The teenager from Warrington in Cheshire was too young to legally drive or drink alcohol, but earned £200,000 for finishing second - part of £1m prize money in his first year as a professional - and an invitation to the elite Premier League competition. He turned 17 later in January but was he too young for the demanding event over 17 Thursday nights in 17 locations? He ended up winning the whole thing, and hit a nine-dart finish against Humphries in the final. From Bahrain to Wolverhampton, Littler claimed 10 titles in 2024 and is now eyeing the World Championship. - - As he progressed at the Ally Pally, the Manchester United fan was sent a good luck message by the club's former midfielder and ex-England captain David Beckham. In 12 months, Littler's Instagram followers have risen from 4,000 to 1.3m. Commercial backers include a clothing range, cereal firm and train company and he will appear in a reboot of the TV darts show Bullseye. Google say he was the most searched-for athlete online in the UK during 2024. On the back of his success, Littler darts, boards, cabinets, shirts are being snapped up in big numbers. "This Christmas the junior magnetic dartboard is selling out, we're talking over 100,000. They're 20 quid and a great introduction for young children," said Garry Plummer, the boss of sponsors Target Darts, who first signed a deal with Littler's family when he was aged 12. "All the toy shops want it, they all want him - 17, clean, doesn't drink, wonderful." - - Littler beat Luke Humphries to win the Premier League title in May - - The number of academies for children under the age of 16 has doubled in the last year, says Junior Darts Corporation chairman Steve Brown. There are 115 dedicated groups offering youngsters equipment, tournaments and a place to develop, with bases including Australia, Bulgaria, Greece, Norway, USA and Mongolia. "We've seen so many inquiries from around the world, it's been such a boom. It took us 14 years to get 1,600 members and within 12 months we have over 3,000, and waiting lists," said Brown. "When I played darts as a child, I was quite embarrassed to tell my friends what my hobby was. All these kids playing darts now are pretty popular at school. It's a bit rock 'n roll and recognised as a cool thing to do." Plans are being hatched to extend the World Championship by four days and increase the number of players from 96 to 128. That will boost the number of tickets available by 25,000 to 115,000 but Hearn reckons he could sell three times as many. He says Saudi Arabia wants to host a tournament, which is likely to happen if no-alcohol regulations are relaxed. "They will change their rules in the next 12 months probably for certain areas having alcohol, and we'll take darts there and have a party in Saudi," he said. "When I got involved in darts, the total prize money was something like £300,000 for the year. This year it will go to £20m. I expect in five years' time, we'll be playing for £40m." - - Former electrician Cross charged to the 2018 world title in his first full season, while Adrian Lewis and Michael van Gerwen were multiple victors in their 20s and 16-time champion Phil ‘The Power’ Taylor is widely considered the greatest of all time. Littler is currently fourth in the world rankings, although that is based on a two-year Order of Merit. There have been suggestions from others the spotlight on the teenager means world number one Humphries, 29, has been denied the coverage he deserves, but no darts player has made a mark at such a young age as Littler. "Luke Humphries is another fabulous player who is going to be around for years. Sport is a very brutal world. It is about winning and claiming the high ground. There will be envy around," Hearn said. "Luke Littler is the next Tiger Woods for darts so they better get used to it, and the only way to compete is to get better." World number 38 Martin Lukeman was awestruck as he described facing a peak Littler after being crushed 16-3 in the Grand Slam final, with the teenager winning 15 consecutive legs. "I can't compete with that, it was like Godly. He was relentless, he is so good it's ridiculous," he said. Lukeman can still see the benefits he brings, adding: "What he's done for the sport is brilliant. If it wasn't for him, our wages wouldn't be going up. There's more sponsors, more money coming in, all good." Hearn feels future competition may come from players even younger than Littler. "I watched a 10-year-old a few months ago who averaged 104.89 and checked out a 4-3 win with a 136 finish. They smell the money, the fame and put the hard work in," he said. How much better Littler can get is guesswork, although Plummer believes he wants to reach new heights. "He never says 'how good was I?' But I think he wants to break records and beat Phil Taylor's 16 World Championships and 16 World Matchplay titles," he said. "He's young enough to do it." A version of this article was originally published on 29 November. - • None Know a lot about Littler? Take our quiz - -------------------------------------------------------------------------------- - Distance: 0.3901, Text: Luke Littler has risen from 164th to fourth in the rankings in a year - - A tearful Luke Littler hit a tournament record 140.91 set average as he started his bid for the PDC World Championship title with a dramatic 3-1 win over Ryan Meikle. The 17-year-old made headlines around the world when he reached the tournament final in January, where he lost to Luke Humphries. Starting this campaign on Saturday, Littler was millimetres away from a nine-darter when he missed double 12 as he blew Meikle away in the fourth and final set of the second-round match. Littler was overcome with emotion at the end, cutting short his on-stage interview. "It was probably the toughest game I've ever played. I had to fight until the end," he said later in a news conference. "As soon as the question came on stage and then boom, the tears came. It was just a bit too much to speak on stage. "It is the worst game I have played. I have never felt anything like that tonight." Admitting to nerves during the match, he told Sky Sports: "Yes, probably the biggest time it's hit me. Coming into it I was fine, but as soon as [referee] George Noble said 'game on', I couldn't throw them." Littler started slowly against Meikle, who had two darts for the opening set, but he took the lead by twice hitting double 20. Meikle did not look overawed against his fellow Englishman and levelled, but Littler won the third set and exploded into life in the fourth. The tournament favourite hit four maximum 180s as he clinched three straight legs in 11, 10 and 11 darts for a record set average, and 100.85 overall. Meanwhile, two seeds crashed out on Saturday night – five-time world champion Raymond van Barneveld lost to Welshman Nick Kenny, while England's Ryan Joyce beat Danny Noppert. Australian Damon Heta was another to narrowly miss out on a nine-darter, just failing on double 12 when throwing for the match in a 3-1 win over Connor Scutt. Ninth seed Heta hit four 100-plus checkouts to come from a set down against Scutt in a match in which both men averaged more than 97. - - Littler was hugged by his parents after victory over Meikle - - ... (output truncated for brevity) - - -Note: To create a COMPOSITE index, the below code can be used. -Choose based on your specific use case and query patterns. For this tutorial's news search scenario, either index type would work, but BHIVE might be more efficient for pure semantic search across news articles. - - -```python -vector_store.create_index(index_type=IndexType.COMPOSITE, index_name="openrouterdeepseek_composite_index", index_description="IVF,SQ8") -``` - -## Setting Up a Couchbase Cache -To further optimize our system, we set up a Couchbase-based cache. A cache is a temporary storage layer that holds data that is frequently accessed, speeding up operations by reducing the need to repeatedly retrieve the same information from the database. In our setup, the cache will help us accelerate repetitive tasks, such as looking up similar documents. By implementing a cache, we enhance the overall performance of our search engine, ensuring that it can handle high query volumes and deliver results quickly. - -Caching is particularly valuable in scenarios where users may submit similar queries multiple times or where certain pieces of information are frequently requested. By storing these in a cache, we can significantly reduce the time it takes to respond to these queries, improving the user experience. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-09-17 16:10:11,473 - INFO - Successfully created cache - - -## Retrieval-Augmented Generation (RAG) with Couchbase and LangChain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -# Create RAG prompt template -rag_prompt = ChatPromptTemplate.from_messages([ - ("system", "You are a helpful assistant that answers questions based on the provided context."), - ("human", "Context: {context}\n\nQuestion: {question}") -]) - -# Create RAG chain -rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | rag_prompt - | llm - | StrOutputParser() -) -logging.info("Successfully created RAG chain") -``` - - 2025-09-18 11:18:34,032 - INFO - Successfully created RAG chain - - - -```python -try: - start_time = time.time() - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: Based on the provided context, Luke Littler's key achievements and records in his recent PDC World Championship match (second-round win against Ryan Meikle) were: - - * **Tournament Record Set Average:** He hit a tournament record 140.91 set average during the match. - * **Near Nine-Darter:** He was "millimetres away from a nine-darter" when he missed double 12. - * **Dominant Final Set:** He won the fourth and final set in just 32 darts (the minimum possible is 27), which included hitting four maximum 180s and clinching three straight legs in 11, 10, and 11 darts. - * **Overall High Average:** He maintained a high overall match average of 100.85. - RAG response generated in 0.49 seconds - - -## Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What were Luke Littler's key achievements and records in his recent PDC World Championship match?", # Repeated query - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") - -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, Liverpool played the majority of the game with 10 men after Andy Robertson received a red card in the 17th minute. Despite being a player down, Liverpool came from behind twice to secure a 2-2 draw. Diogo Jota scored an 86th-minute equalizer to earn Liverpool a point. The performance was praised for its resilience, with Fulham's Antonee Robinson noting that Liverpool "didn't feel like they had 10 men at all." Liverpool maintained over 60% possession and led in attacking metrics such as shots and chances. Both managers acknowledged the strong efforts of their teams in what was described as an enthralling encounter. - Time taken: 4.65 seconds - - Query 2: What were Luke Littler's key achievements and records in his recent PDC World Championship match? - Response: Based on the provided context, Luke Littler's key achievements and records in his recent PDC World Championship match (second-round win against Ryan Meikle) were: - - * **Tournament Record Set Average:** He hit a tournament record 140.91 set average during the match. - * **Near Nine-Darter:** He was "millimetres away from a nine-darter" when he missed double 12. - * **Dominant Final Set:** He won the fourth and final set in just 32 darts (the minimum possible is 27), which included hitting four maximum 180s and clinching three straight legs in 11, 10, and 11 darts. - * **Overall High Average:** He maintained a high overall match average of 100.85. - Time taken: 0.45 seconds - - Query 3: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, Liverpool played the majority of the game with 10 men after Andy Robertson received a red card in the 17th minute. Despite being a player down, Liverpool came from behind twice to secure a 2-2 draw. Diogo Jota scored an 86th-minute equalizer to earn Liverpool a point. The performance was praised for its resilience, with Fulham's Antonee Robinson noting that Liverpool "didn't feel like they had 10 men at all." Liverpool maintained over 60% possession and led in attacking metrics such as shots and chances. Both managers acknowledged the strong efforts of their teams in what was described as an enthralling encounter. - Time taken: 1.15 seconds - - -## Conclusion -By following these steps, you'll have a fully functional semantic search engine that leverages the strengths of Couchbase and Deepseek(via Openrouter). This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you're a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_PydanticAI.md b/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_PydanticAI.md deleted file mode 100644 index 69ce9fc..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_PydanticAI.md +++ /dev/null @@ -1,618 +0,0 @@ ---- -# frontmatter -path: "/tutorial-pydantic-ai-couchbase-rag" -title: Retrieval-Augmented Generation (RAG) with Couchbase and PydanticAI -short_title: RAG with Couchbase and PydanticAI -description: - - Learn how to build a semantic search engine using Couchbase and PydanticAI. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with PydanticAI using tool calling. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using PydanticAI and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - OpenAI - - PydanticAI -sdk_language: - - python -length: 30 Mins ---- - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/pydantic_ai/RAG_with_Couchbase_and_PydanticAI.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [OpenAI](https://openai.com) as the embedding and LLM provider, and [PydanticAI](https://ai.pydantic.dev) as an agent orchestrator. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and OpenAI provides advanced AI models for generating embeddings and understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - -```python -%pip install --quiet -U datasets==3.5.0 langchain-couchbase==0.3.0 langchain-openai==0.3.13 python-dotenv==1.1.0 pydantic-ai==0.1.1 ipywidgets==8.1.6 -``` - - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import os -import time -from uuid import uuid4 -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (InternalServerFailureException, - QueryIndexAlreadyExistsException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -from langchain_openai import OpenAIEmbeddings -from tqdm import tqdm - -from dataclasses import dataclass -from pydantic_ai import Agent, RunContext -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv() - -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API Key: ') - -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: vector-search-testing): ') or 'vector-search-testing' -INDEX_NAME = os.getenv('INDEX_NAME') or input('Enter your index name (default: vector_search_pydantic_ai): ') or 'vector_search_pydantic_ai' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: pydantic_ai): ') or 'pydantic_ai' - -# Check if the variables are correctly loaded -if not OPENAI_API_KEY: - raise ValueError("Missing OpenAI API Key") - -if 'OPENAI_API_KEY' not in os.environ: - os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-04-11 13:54:19,537 - INFO - Successfully connected to Couchbase - - -# Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Creates primary index on collection for query performance -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - time.sleep(2) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists.Skipping creation.") - - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -``` - - 2025-04-11 13:54:23,668 - INFO - Bucket 'vector-search-testing' does not exist. Creating it... - - - 2025-04-11 13:54:25,721 - INFO - Bucket 'vector-search-testing' created successfully. - 2025-04-11 13:54:25,728 - INFO - Scope 'shared' does not exist. Creating it... - 2025-04-11 13:54:25,777 - INFO - Scope 'shared' created successfully. - 2025-04-11 13:54:25,796 - INFO - Collection 'pydantic_ai' does not exist. Creating it... - 2025-04-11 13:54:27,843 - INFO - Collection 'pydantic_ai' created successfully. - 2025-04-11 13:54:28,120 - INFO - Primary index present or created successfully. - 2025-04-11 13:54:28,133 - INFO - All documents cleared from the collection. - - - - - - - - - -# Loading Couchbase Vector Search Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Vector Search Index** comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -This vector search index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `pydantic_ai`. The configuration is set up for vectors with exactly `1536 dimensions`, using dot product similarity and optimized for recall. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly. - -For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - - - -```python -# If you are running this script locally (not in Google Colab), uncomment the following line -# and provide the path to your index definition file. - -# index_definition_path = '/path_to_your_index_file/pydantic_ai_index.json' # Local setup: specify your file path here - -# # Version for Google Colab -# def load_index_definition_colab(): -# from google.colab import files -# print("Upload your index definition file") -# uploaded = files.upload() -# index_definition_path = list(uploaded.keys())[0] - -# try: -# with open(index_definition_path, 'r') as file: -# index_definition = json.load(file) -# return index_definition -# except Exception as e: -# raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Version for Local Environment -def load_index_definition_local(index_definition_path): - try: - with open(index_definition_path, 'r') as file: - index_definition = json.load(file) - return index_definition - except Exception as e: - raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Usage -# Uncomment the appropriate line based on your environment -# index_definition = load_index_definition_colab() -index_definition = load_index_definition_local('pydantic_ai_index.json') -``` - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") - -except InternalServerFailureException as e: - error_message = str(e) - logging.error(f"InternalServerFailureException raised: {error_message}") - - try: - # Accessing the response_body attribute from the context - error_context = e.context - response_body = error_context.response_body - if response_body: - error_details = json.loads(response_body) - error_message = error_details.get('error', '') - - if "collection: 'pydantic_ai' doesn't belong to scope: 'shared'" in error_message: - raise ValueError("Collection 'pydantic_ai' does not belong to scope 'shared'. Please check the collection and scope names.") - - except ValueError as ve: - logging.error(str(ve)) - raise - - except Exception as json_error: - logging.error(f"Failed to parse the error message: {json_error}") - raise RuntimeError(f"Internal server error while creating/updating search index: {error_message}") -``` - - 2025-04-11 13:54:41,157 - INFO - Creating new index 'vector-search-testing.shared.vector_search_pydantic_ai'... - 2025-04-11 13:54:41,316 - INFO - Index 'vector-search-testing.shared.vector_search_pydantic_ai' successfully created/updated. - - -# Creating OpenAI Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using OpenAI, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - -```python -try: - embeddings = OpenAIEmbeddings( - model="text-embedding-3-small", - api_key=OPENAI_API_KEY, - ) - logging.info("Successfully created OpenAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating OpenAIEmbeddings: {str(e)}") -``` - - 2025-04-11 13:55:10,426 - INFO - Successfully created OpenAIEmbeddings - - -# Setting Up the Couchbase Vector Store -The vector store is set up to manage the embeddings created in the previous step. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors. In this case, the vector store is built on top of Couchbase, allowing the script to store the embeddings in a way that can be efficiently searched. - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - - 2025-04-11 13:55:12,849 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-04-11 13:55:22,967 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -With the Vector store set up, the next step is to populate it with data. We save the BBC articles dataset to the vector store. For each document, we will generate the embeddings for the article to use with the semantic search using LangChain. Here one of the articles is larger than the maximum tokens that we can use for our embedding model. If we want to ingest that document, we could split the document and ingest it in parts. However, since it is only a single document for simplicity, we ignore that document from the ingestion process. - - -```python -# Save the current logging level -current_logging_level = logging.getLogger().getEffectiveLevel() - -# # Set logging level to CRITICAL to suppress lower level logs -logging.getLogger().setLevel(logging.CRITICAL) - -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles - ) -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -# Restore the original logging level -logging.getLogger().setLevel(current_logging_level) -``` - -# PydanticAI: An Introduction -From [PydanticAI](https://ai.pydantic.dev/)'s website: - -> PydanticAI is a Python agent framework designed to make it less painful to build production grade applications with Generative AI. - -PydanticAI allows us to define agents and tools easily to create Gen-AI apps in an innovative and painless manner. Some of its features are: -- Built by the Pydantic Team: Built by the team behind Pydantic (the validation layer of the OpenAI SDK, the Anthropic SDK, LangChain, LlamaIndex, AutoGPT, Transformers, CrewAI, Instructor and many more). - -- Model-agnostic: Supports OpenAI, Anthropic, Gemini, Deepseek, Ollama, Groq, Cohere, and Mistral, and there is a simple interface to implement support for other models. - -- Type-safe: Designed to make type checking as powerful and informative as possible for you. - -- Python-centric Design: Leverages Python's familiar control flow and agent composition to build your AI-driven projects, making it easy to apply standard Python best practices you'd use in any other (non-AI) project. - -- Structured Responses: Harnesses the power of Pydantic to validate and structure model outputs, ensuring responses are consistent across runs. - -- Dependency Injection System: Offers an optional dependency injection system to provide data and services to your agent's system prompts, tools and result validators. This is useful for testing and eval-driven iterative development. - -- Streamed Responses: Provides the ability to stream LLM outputs continuously, with immediate validation, ensuring rapid and accurate results. - -- Graph Support: Pydantic Graph provides a powerful way to define graphs using typing hints, this is useful in complex applications where standard control flow can degrade to spaghetti code. - -# Building a RAG Agent using PydanticAI - -PydanticAI makes heavy use of dependency injection to provide data and services to your agent's system prompts and tools. We define dependencies using a `dataclass`, which serves as a container for our dependencies. - -In our case, the only dependency for our agent to work in the `CouchbaseSearchVectorStore` instance. However, we will still use a `dataclass` as it is good practice. In the future, in case we wish to add more dependencies, we can just add more fields to the `dataclass` `Deps`. - -We also initialize an agent as a GPT-4o model. PydanticAI supports many different LLM providers, including Anthropic, Google, Cohere, etc. which can also be used. While initializing the agent, we also pass the type of the dependencies. This is mainly used for type checking, and not actually used at runtime. - - -```python -@dataclass -class Deps: - vector_store: CouchbaseSearchVectorStore - -agent = Agent("openai:gpt-4o", deps_type=Deps) -``` - -# Defining the Vector Store as a Tool -PydanticAI has the concept of `function tools`, which are functions that can be called by LLMs to retrieve extra information that can help form a better response. - -We can perform RAG by creating a tool which retrieves documents that are semantically similar to the query, and allowing the agent to call the tool when required. We can add the function as a tool using the `@agent.tool` decorator. - -Notice that we also add the `context` parameter, which contains the dependencies that are passed to the tool (in this case, the only dependency is the vector store). - - -```python -@agent.tool -async def retrieve(context: RunContext[Deps], search_query: str) -> str: - """Retrieve news data based on a search query. - - Args: - context: The call context - search_query: The search query - """ - search_results = context.deps.vector_store.similarity_search_with_score(search_query, k=5) - return "\n\n".join( - f"# Documents:\n{doc.page_content}" - for doc, score in search_results - ) -``` - -Finally, we create a function that allows us to define our dependencies and run our agent. - - -```python -async def run_agent(question: str): - deps = Deps( - vector_store=vector_store, - ) - answer = await agent.run(question, deps=deps) - return answer -``` - -# Running our Agent -We have now finished setting up our vector store and agent! The system is now ready to accept queries. - - -```python -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" -output = await run_agent(query) - -print("=" * 20, "Agent Output", "=" * 20) -print(output.data) -``` - - 2025-04-11 13:56:53,839 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" - 2025-04-11 13:56:54,485 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" - 2025-04-11 13:57:01,928 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" - - - ==================== Agent Output ==================== - Pep Guardiola has expressed a mix of determination and concern regarding Manchester City's current form. He acknowledged the personal impact of the team's downturn, admitting that the situation has affected his sleep and diet due to the worst run of results he has ever faced in his managerial career. Guardiola described his state of mind as "ugly," noting the team's precarious position in competitions and the need to defend better and avoid mistakes. - - Despite these challenges, Guardiola remains committed to finding solutions, emphasizing the need to improve defensive concepts and restore the team's intensity and form. He acknowledged the errors from some of the best players in the world and expressed a need for the team to stay positive and for players to have the necessary support to overcome their current struggles. - - Moreover, Guardiola expressed a pragmatic view of the situation, accepting that the team must "survive" the season and acknowledging a potential need for a significant rebuild to address the challenges they're facing. As a testament to his commitment, he noted his intention to continue shaping the club during his newly extended contract period. Throughout, he reiterated his belief in the team and emphasized the need to find a way forward. - - -# Inspecting the Agent -We can use the `all_messages()` method in the output object to observe how the agent and tools work. - -In the cell below, we see an extremely detailed list of all the model's messages and tool calls, which happens step by step: -1. The `UserPromptPart`, which consists of the query the user sends to the agent. -2. The agent calls the `retrieve` tool in the `ToolCallPart` message. This includes the `search_query` argument. Couchbase uses this `search_query` to perform semantic search over all the ingested news articles. -3. The `retrieve` tool returns a `ToolReturnPart` object with all the context required for the model to answer the user's query. The retrieve documents were truncated, because a large amount of context was retrieved. -4. The final message is the LLM generated response with the added context, which is sent back to the user. - - -```python -from pprint import pprint - -for idx, message in enumerate(output.all_messages(), start=1): - print(f"Step {idx}:") - pprint(message.__repr__()) - print("=" * 50) -``` - - Step 1: - ('ModelRequest(parts=[UserPromptPart(content="What was manchester city manager ' - 'pep guardiola\'s reaction to the team\'s current form?", ' - 'timestamp=datetime.datetime(2025, 4, 11, 8, 26, 52, 836357, ' - "tzinfo=datetime.timezone.utc), part_kind='user-prompt')], kind='request')") - ================================================== - Step 2: - ("ModelResponse(parts=[ToolCallPart(tool_name='retrieve', " - 'args=\'{"search_query":"Pep Guardiola reaction to Manchester City current ' - 'form"}\', tool_call_id=\'call_oo4Jjn93VkRJ3q9PnAwkt3xm\', ' - "part_kind='tool-call')], model_name='gpt-4o-2024-08-06', " - 'timestamp=datetime.datetime(2025, 4, 11, 8, 26, 53, ' - "tzinfo=datetime.timezone.utc), kind='response')") - ================================================== - Step 3: - ("ModelRequest(parts=[ToolReturnPart(tool_name='retrieve', content='# " - 'Documents:\\nManchester City boss Pep Guardiola has won 18 trophies since he ' - 'arrived at the club in 2016\\n\\nManchester City boss Pep Guardiola says he ' - 'is "fine" despite admitting his sleep and diet are being affected by the ' - 'worst run of results in his entire managerial career. In an interview with ' - 'former Italy international Luca Toni for Amazon Prime Sport before ' - "Wednesday\\'s Champions League defeat by Juventus, Guardiola touched on the " - "personal impact City\\'s sudden downturn in form has had. Guardiola said his " - 'state of mind was "ugly", that his sleep was "worse" and he was eating ' - "lighter as his digestion had suffered. City go into Sunday\\'s derby against " - - ... (output truncated for brevity) - diff --git a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_SmolAgents.md b/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_SmolAgents.md deleted file mode 100644 index e4c8280..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_SmolAgents.md +++ /dev/null @@ -1,650 +0,0 @@ ---- -# frontmatter -path: "/tutorial-smolagents-couchbase-rag" -title: Retrieval-Augmented Generation (RAG) with Couchbase and smolagents -short_title: RAG with Couchbase and smolagents -description: - - Learn how to build a semantic search engine using Couchbase and Hugging Face smolagents. - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with smolagents using tool calling. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using smolagents and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - OpenAI - - smolagents -sdk_language: - - python -length: 30 Mins ---- - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/smolagents/RAG_with_Couchbase_and_SmolAgents.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database, [OpenAI](https://openai.com) as the embedding and LLM provider, and [Hugging Face smolagents](https://huggingface.co/docs/smolagents/en/index) as an agent framework. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start -## Get Credentials for OpenAI -Please follow the [instructions](https://platform.openai.com/docs/quickstart) to generate the OpenAI credentials. -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. Each library has a specific role: Couchbase libraries manage database operations, LangChain handles AI model integrations, and OpenAI provides advanced AI models for generating embeddings and understanding natural language. By setting up these libraries, we ensure our environment is equipped to handle the data-intensive and computationally complex tasks required for semantic search. - - -```python -%pip install --quiet -U datasets==3.5.0 langchain-couchbase==0.3.0 langchain-openai==0.3.13 python-dotenv==1.1.0 smolagents==1.13.0 ipywidgets==8.1.6 -``` - -# Importing Necessary Libraries -The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models. - - -```python -import getpass -import json -import logging -import os -import time -from datetime import timedelta - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (InternalServerFailureException, - ServiceUnavailableException, - QueryIndexAlreadyExistsException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from dotenv import load_dotenv -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -from langchain_openai import OpenAIEmbeddings - -from smolagents import Tool, OpenAIServerModel, ToolCallingAgent -``` - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s', force=True) -``` - -# Loading Sensitive Information -In this section, we prompt the user to input essential configuration settings needed. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv() - -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API Key: ') - -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: vector-search-testing): ') or 'vector-search-testing' -INDEX_NAME = os.getenv('INDEX_NAME') or input('Enter your index name (default: vector_search_smolagents): ') or 'vector_search_smolagents' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: smolagents): ') or 'smolagents' - -# Check if the variables are correctly loaded -if not OPENAI_API_KEY: - raise ValueError("Missing OpenAI API Key") - -if 'OPENAI_API_KEY' not in os.environ: - os.environ['OPENAI_API_KEY'] = OPENAI_API_KEY -``` - -# Connecting to the Couchbase Cluster -Connecting to a Couchbase cluster is the foundation of our project. Couchbase will serve as our primary data store, handling all the storage and retrieval operations required for our semantic search engine. By establishing this connection, we enable our application to interact with the database, allowing us to perform operations such as storing embeddings, querying data, and managing collections. This connection is the gateway through which all data will flow, so ensuring it's set up correctly is paramount. - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-02-28 10:30:17,515 - INFO - Successfully connected to Couchbase - - -# Setting Up Collections in Couchbase -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: - -- Creates primary index on collection for query performance -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: - -1. Main collection for vector embeddings -2. Cache collection for storing results - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -``` - - 2025-02-28 10:30:20,855 - INFO - Bucket 'vector-search-testing' exists. - 2025-02-28 10:30:21,350 - INFO - Collection 'smolagents' does not exist. Creating it... - 2025-02-28 10:30:21,619 - INFO - Collection 'smolagents' created successfully. - 2025-02-28 10:30:26,886 - INFO - Primary index present or created successfully. - 2025-02-28 10:30:26,938 - INFO - All documents cleared from the collection. - - - - - - - - - -# Loading Couchbase Vector Search Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Vector Search Index** comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -This vector search index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `smolagents`. The configuration is set up for vectors with exactly `1536 dimensions`, using dot product similarity and optimized for recall. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly. - -For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - - - -```python -# If you are running this script locally (not in Google Colab), uncomment the following line -# and provide the path to your index definition file. - -# index_definition_path = '/path_to_your_index_file/smolagents_index.json' # Local setup: specify your file path here - -# # Version for Google Colab -# def load_index_definition_colab(): -# from google.colab import files -# print("Upload your index definition file") -# uploaded = files.upload() -# index_definition_path = list(uploaded.keys())[0] - -# try: -# with open(index_definition_path, 'r') as file: -# index_definition = json.load(file) -# return index_definition -# except Exception as e: -# raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Version for Local Environment -def load_index_definition_local(index_definition_path): - try: - with open(index_definition_path, 'r') as file: - index_definition = json.load(file) - return index_definition - except Exception as e: - raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Usage -# Uncomment the appropriate line based on your environment -# index_definition = load_index_definition_colab() -index_definition = load_index_definition_local('smolagents_index.json') -``` - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") -except ServiceUnavailableException: - raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.") -except InternalServerFailureException as e: - logging.error(f"Internal server error: {str(e)}") - raise -``` - - 2025-02-28 10:30:32,890 - INFO - Creating new index 'vector-search-testing.shared.vector_search_smolagents'... - 2025-02-28 10:30:33,058 - INFO - Index 'vector-search-testing.shared.vector_search_smolagents' successfully created/updated. - - -# Creating OpenAI Embeddings -Embeddings are at the heart of semantic search. They are numerical representations of text that capture the semantic meaning of the words and phrases. Unlike traditional keyword-based search, which looks for exact matches, embeddings allow our search engine to understand the context and nuances of language, enabling it to retrieve documents that are semantically similar to the query, even if they don't contain the exact keywords. By creating embeddings using OpenAI, we equip our search engine with the ability to understand and process natural language in a way that's much closer to how humans understand language. This step transforms our raw text data into a format that the search engine can use to find and rank relevant documents. - - -```python -try: - embeddings = OpenAIEmbeddings( - model="text-embedding-3-small", - api_key=OPENAI_API_KEY, - ) - logging.info("Successfully created OpenAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating OpenAIEmbeddings: {str(e)}") -``` - - 2025-02-28 10:30:36,983 - INFO - Successfully created OpenAIEmbeddings - - -# Setting Up the Couchbase Vector Store -A vector store is where we'll keep our embeddings. Unlike the FTS index, which is used for text-based search, the vector store is specifically designed to handle embeddings and perform similarity searches. When a user inputs a query, the search engine converts the query into an embedding and compares it against the embeddings stored in the vector store. This allows the engine to find documents that are semantically similar to the query, even if they don't contain the exact same words. By setting up the vector store in Couchbase, we create a powerful tool that enables our search engine to understand and retrieve information based on the meaning and context of the query, rather than just the specific words used. - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") - -``` - - 2025-02-28 10:30:40,503 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-02-28 10:30:51,981 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: - -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Error Handling: If an error occurs, only the current batch is affected -3. Progress Tracking: Easier to monitor and track the ingestion progress -4. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 100 to ensure reliable operation. The optimal batch size depends on many factors including: - -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - -```python -# Save the current logging level -current_logging_level = logging.getLogger().getEffectiveLevel() - -# # Set logging level to CRITICAL to suppress lower level logs -logging.getLogger().setLevel(logging.CRITICAL) - -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=100 - ) -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -# Restore the original logging level -logging.getLogger().setLevel(current_logging_level) -``` - -# smolagents: An Introduction -[smolagents](https://huggingface.co/docs/smolagents/en/index) is a agentic framework by Hugging Face for easy creation of agents in a few lines of code. - -Some of the features of smolagents are: - -- ✨ Simplicity: the logic for agents fits in ~1,000 lines of code (see agents.py). We kept abstractions to their minimal shape above raw code! - -- 🧑‍💻 First-class support for Code Agents. Our CodeAgent writes its actions in code (as opposed to "agents being used to write code"). To make it secure, we support executing in sandboxed environments via E2B. - -- 🤗 Hub integrations: you can share/pull tools to/from the Hub, and more is to come! - -- 🌐 Model-agnostic: smolagents supports any LLM. It can be a local transformers or ollama model, one of many providers on the Hub, or any model from OpenAI, Anthropic and many others via our LiteLLM integration. - -- 👁️ Modality-agnostic: Agents support text, vision, video, even audio inputs! Cf this tutorial for vision. - -- 🛠️ Tool-agnostic: you can use tools from LangChain, Anthropic's MCP, you can even use a Hub Space as a tool. - -# Building a RAG Agent using smolagents - -smolagents allows users to define their own tools for the agent to use. These tools can be of two types: -1. Tools defined as classes: These tools are subclassed from the `Tool` class and must override the `forward` method, which is called when the tool is used. -2. Tools defined as functions: These are simple functions that are called when the tool is used, and are decorated with the `@tool` decorator. - -In our case, we will use the first method, and we define our `RetrieverTool` below. We define a name, a description and a dictionary of inputs that the tool accepts. This helps the LLM properly identify and use the tool. - -The `RetrieverTool` is simple: it takes a query generated by the user, and uses Couchbase's performant vector search service under the hood to search for semantically similar documents to the query. The LLM can then use this context to answer the user's question. - - -```python -class RetrieverTool(Tool): - name = "retriever" - description = "Uses semantic search to retrieve the parts of transformers documentation that could be most relevant to answer your query." - inputs = { - "query": { - "type": "string", - "description": "The query to perform. This should be semantically close to your target documents. Use the affirmative form rather than a question.", - } - } - output_type = "string" - - def __init__(self, vector_store: CouchbaseSearchVectorStore, **kwargs): - super().__init__(**kwargs) - self.vector_store = vector_store - - def forward(self, query: str) -> str: - assert isinstance(query, str), "Query must be a string" - - docs = self.vector_store.similarity_search_with_score(query, k=5) - return "\n\n".join( - f"# Documents:\n{doc.page_content}" - for doc, score in docs - ) - -retriever_tool = RetrieverTool(vector_store) -``` - -# Defining Our Agent -smolagents have predefined configurations for agents that we can use. We use the `ToolCallingAgent`, which writes its tool calls in a JSON format. Alternatively, there also exists a `CodeAgent`, in which the LLM defines it's functions in code. - -The `CodeAgent` is offers benefits in certain challenging scenarios: it can lead to [higher performance in difficult benchmarks](https://huggingface.co/papers/2411.01747) and use [30% fewer steps to solve problems](https://huggingface.co/papers/2402.01030). However, since our use case is just a simple RAG tool, a `ToolCallingAgent` will suffice. - - -```python -agent = ToolCallingAgent( - tools=[retriever_tool], - model=OpenAIServerModel( - model_id="gpt-4o-2024-08-06", - api_key=OPENAI_API_KEY, - ), - max_steps=4, - verbosity_level=2 -) -``` - -# Running our Agent -We have now finished setting up our vector store and agent! The system is now ready to accept queries. - - -```python -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -agent_output = agent.run(query) -``` - - -
╭──────────────────────────────────────────────────── New run ────────────────────────────────────────────────────╮
-                                                                                                                 
- What was manchester city manager pep guardiola's reaction to the team's current form?                           
-                                                                                                                 
-╰─ OpenAIServerModel - gpt-4o-2024-08-06 ─────────────────────────────────────────────────────────────────────────╯
-
- - - - -
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 1 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-
- - - - 2025-02-28 10:32:28,032 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" - - - -
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
-│ Calling tool: 'retriever' with arguments: {'query': "Pep Guardiola's reaction to Manchester City's current      │
-│ form"}                                                                                                          │
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - 2025-02-28 10:32:28,466 - INFO - HTTP Request: POST https://api.openai.com/v1/embeddings "HTTP/1.1 200 OK" - - - -
[Step 0: Duration 2.25 seconds| Input tokens: 1,010 | Output tokens: 23]
-
- - - - -
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ Step 2 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-
- - - - 2025-02-28 10:32:31,724 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK" - - - -
╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
-│ Calling tool: 'final_answer' with arguments: {'answer': 'Manchester City manager Pep Guardiola has expressed a  │
-│ mix of concern and determination regarding the team\'s current form. Guardiola admitted that this is the worst  │
-│ run of results in his managerial career and that it has affected his sleep and diet. He described his state of  │
-│ mind as "ugly" and acknowledged that City needs to defend better and avoid making mistakes. Despite his         │
-│ personal challenges, Guardiola stated that he is "fine" and focused on finding solutions.\n\nGuardiola also     │
-│ took responsibility for the team\'s struggles, stating he is "not good enough" and has to find solutions. He    │
-│ expressed self-doubt but is striving to improve the team\'s situation step by step. Guardiola has faced         │
-│ criticism due to the team\'s poor form, which has seen them lose several matches and fall behind in the title   │
-│ race.\n\nHe emphasized the need to restore their defensive strength and regain confidence in their play.        │
-│ Guardiola is planning a significant rebuild of the squad to address these challenges, aiming to replace several │
-│ regular starters and emphasize improvements in the team\'s intensity and defensive concepts.'}                  │
-╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
-
- - - - -
Final answer: Manchester City manager Pep Guardiola has expressed a mix of concern and determination regarding the 
-team's current form. Guardiola admitted that this is the worst run of results in his managerial career and that it 
-has affected his sleep and diet. He described his state of mind as "ugly" and acknowledged that City needs to 
-defend better and avoid making mistakes. Despite his personal challenges, Guardiola stated that he is "fine" and 
-focused on finding solutions.
-
-Guardiola also took responsibility for the team's struggles, stating he is "not good enough" and has to find 
-solutions. He expressed self-doubt but is striving to improve the team's situation step by step. Guardiola has 
-faced criticism due to the team's poor form, which has seen them lose several matches and fall behind in the title 
-race.
-
-He emphasized the need to restore their defensive strength and regain confidence in their play. Guardiola is 
-planning a significant rebuild of the squad to address these challenges, aiming to replace several regular starters
-and emphasize improvements in the team's intensity and defensive concepts.
-
- - - - -
[Step 1: Duration 2.74 seconds| Input tokens: 7,162 | Output tokens: 241]
-
- - - -# Analyzing the Agent -When the agent runs, smolagents prints out the steps that the agent takes along with the tools called in each step. In the above tool call, two steps occur: - -**Step 1**: First, the agent determines that it requires a tool to be used, and the `retriever` tool is called. The agent also specifies the query parameter for the tool (a string). The tool returns semantically similar documents to the query from Couchbase's vector store. - -**Step 2**: Next, the agent determines that the context retrieved from the tool is sufficient to answer the question. It then calls the `final_answer` tool, which is predefined for each agent: this tool is called when the agent returns the final answer to the user. In this step, the LLM answers the user's query from the context retrieved in step 1 and passes it to the `final_answer` tool, at which point the agent's execution ends. - -# Conclusion - -By following these steps, you’ll have a fully functional agentic RAG system that leverages the strengths of Couchbase and smolagents, along with OpenAI. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you’re a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, RAG-driven chat system. diff --git a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Voyage.md b/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Voyage.md deleted file mode 100644 index 587ee3e..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/RAG_with_Couchbase_and_Voyage.md +++ /dev/null @@ -1,691 +0,0 @@ ---- -# frontmatter -path: "/tutorial-openai-voyage-couchbase-rag" -title: Retrieval-Augmented Generation (RAG) with Couchbase, OpenAI, and Voyage -short_title: RAG with Couchbase, OpenAI, and Claude -description: - - Learn how to build a semantic search engine using Couchbase, OpenAI, and Voyage - - This tutorial demonstrates how to integrate Couchbase's vector search capabilities with Voyage embeddings and use OpenAI as the language model. - - You'll understand how to perform Retrieval-Augmented Generation (RAG) using LangChain and Couchbase. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - LangChain - - OpenAI -sdk_language: - - python -length: 60 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/voyage/RAG_with_Couchbase_and_Voyage.ipynb) - -# Introduction -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Voyage](https://www.voyageai.com/) as the AI-powered embedding and [OpenAI](https://openai.com/) as the language model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. This tutorial is designed to be beginner-friendly, with clear, step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system from scratch. - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/voyage/RAG_with_Couchbase_and_Voyage.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for VoyageAI and OpenAI - -* Please follow the [instructions](https://platform.openai.com/docs/quickstart) to generate the OpenAI credentials. -* Please follow the [instructions](https://docs.voyageai.com/docs/api-key-and-installation) to generate the VoyageAI credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Setting the Stage: Installing Necessary Libraries -To build our semantic search engine, we need a robust set of tools. The libraries we install handle everything from connecting to databases to performing complex machine learning tasks. - - -```python -%pip install --quiet datasets==3.5.0 langchain-couchbase==0.3.0 langchain-voyageai==0.1.4 langchain-openai==0.3.13 -``` - - Note: you may need to restart the kernel to use updated packages. - - -# Importing Necessary Libraries -This block imports all the required libraries and modules used in the notebook. These include libraries for environment management, data handling, natural language processing, interaction with Couchbase, and embeddings generation. Each library serves a specific function, such as managing environment variables, handling datasets, or interacting with the Couchbase database. - - -```python -import json -import logging -import os -import time -import getpass -from datetime import timedelta -from dotenv import load_dotenv - -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.exceptions import (CouchbaseException, - InternalServerFailureException, - QueryIndexAlreadyExistsException,ServiceUnavailableException) -from couchbase.management.buckets import CreateBucketSettings -from couchbase.management.search import SearchIndex -from couchbase.options import ClusterOptions -from datasets import load_dataset -from langchain_core.documents import Document -from langchain_core.globals import set_llm_cache -from langchain_core.output_parsers import StrOutputParser -from langchain_core.prompts import ChatPromptTemplate -from langchain_core.runnables import RunnablePassthrough -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore -from langchain_openai import ChatOpenAI -from langchain_voyageai import VoyageAIEmbeddings -``` - - /Users/aayush.tyagi/Documents/AI/vector-search-cookbook/.venv/lib/python3.12/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html - from .autonotebook import tqdm as notebook_tqdm - - -# Setup Logging -Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script. - - - -```python -logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True) - -# Set the logging from the httpx library to CRITICAL to avoid excessive logging -logging.getLogger('httpx').setLevel(logging.CRITICAL) -``` - -# Loading Sensitive Informnation -In this section, we prompt the user to input essential configuration settings needed for integrating Couchbase with Cohere's API. These settings include sensitive information like API keys, database credentials, and specific configuration names. Instead of hardcoding these details into the script, we request the user to provide them at runtime, ensuring flexibility and security. - -The script also validates that all required inputs are provided, raising an error if any crucial information is missing. This approach ensures that your integration is both secure and correctly configured without hardcoding sensitive information, enhancing the overall security and maintainability of your code. - - -```python -load_dotenv() - -VOYAGE_API_KEY = os.getenv('VOYAGE_API_KEY') or getpass.getpass('Enter your VoyageAI API key: ') -OPENAI_API_KEY = os.getenv('OPENAI_API_KEY') or getpass.getpass('Enter your OpenAI API key: ') -CB_HOST = os.getenv('CB_HOST') or input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost' -CB_USERNAME = os.getenv('CB_USERNAME') or input('Enter your Couchbase username (default: Administrator): ') or 'Administrator' -CB_PASSWORD = os.getenv('CB_PASSWORD') or getpass.getpass('Enter your Couchbase password (default: password): ') or 'password' -CB_BUCKET_NAME = os.getenv('CB_BUCKET_NAME') or input('Enter your Couchbase bucket name (default: vector-search-testing): ') or 'vector-search-testing' -INDEX_NAME = os.getenv('INDEX_NAME') or input('Enter your index name (default: vector_search_voyage): ') or 'vector_search_voyage' -SCOPE_NAME = os.getenv('SCOPE_NAME') or input('Enter your scope name (default: shared): ') or 'shared' -COLLECTION_NAME = os.getenv('COLLECTION_NAME') or input('Enter your collection name (default: voyage): ') or 'voyage' -CACHE_COLLECTION = os.getenv('CACHE_COLLECTION') or input('Enter your cache collection name (default: cache): ') or 'cache' - -# Verifying that essential environment variables are set -if not VOYAGE_API_KEY: - raise ValueError("VOYAGE_API_KEY is required.") -if not OPENAI_API_KEY: - raise ValueError("OPENAI_API_KEY is required.") -``` - -# Connect to Couchbase -The script attempts to establish a connection to the Couchbase database using the credentials retrieved from the environment variables. Couchbase is a NoSQL database known for its flexibility, scalability, and support for various data models, including document-based storage. The connection is authenticated using a username and password, and the script waits until the connection is fully established before proceeding. - - - - - -```python -try: - auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD) - options = ClusterOptions(auth) - cluster = Cluster(CB_HOST, options) - cluster.wait_until_ready(timedelta(seconds=5)) - logging.info("Successfully connected to Couchbase") -except Exception as e: - raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}") -``` - - 2025-02-24 01:02:11,426 - INFO - Successfully connected to Couchbase - - -## Setting Up Collections in Couchbase - -The setup_collection() function handles creating and configuring the hierarchical data organization in Couchbase: - -1. Bucket Creation: - - Checks if specified bucket exists, creates it if not - - Sets bucket properties like RAM quota (1024MB) and replication (disabled) - - Note: You will not be able to create a bucket on Capella - -2. Scope Management: - - Verifies if requested scope exists within bucket - - Creates new scope if needed (unless it's the default "_default" scope) - -3. Collection Setup: - - Checks for collection existence within scope - - Creates collection if it doesn't exist - - Waits 2 seconds for collection to be ready - -Additional Tasks: -- Creates primary index on collection for query performance -- Clears any existing documents for clean state -- Implements comprehensive error handling and logging - -The function is called twice to set up: -1. Main collection for vector embeddings -2. Cache collection for storing results - - - -```python -def setup_collection(cluster, bucket_name, scope_name, collection_name): - try: - # Check if bucket exists, create if it doesn't - try: - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' exists.") - except Exception as e: - logging.info(f"Bucket '{bucket_name}' does not exist. Creating it...") - bucket_settings = CreateBucketSettings( - name=bucket_name, - bucket_type='couchbase', - ram_quota_mb=1024, - flush_enabled=True, - num_replicas=0 - ) - cluster.buckets().create_bucket(bucket_settings) - time.sleep(2) # Wait for bucket creation to complete and become available - bucket = cluster.bucket(bucket_name) - logging.info(f"Bucket '{bucket_name}' created successfully.") - - bucket_manager = bucket.collections() - - # Check if scope exists, create if it doesn't - scopes = bucket_manager.get_all_scopes() - scope_exists = any(scope.name == scope_name for scope in scopes) - - if not scope_exists and scope_name != "_default": - logging.info(f"Scope '{scope_name}' does not exist. Creating it...") - bucket_manager.create_scope(scope_name) - logging.info(f"Scope '{scope_name}' created successfully.") - - # Check if collection exists, create if it doesn't - collections = bucket_manager.get_all_scopes() - collection_exists = any( - scope.name == scope_name and collection_name in [col.name for col in scope.collections] - for scope in collections - ) - - if not collection_exists: - logging.info(f"Collection '{collection_name}' does not exist. Creating it...") - bucket_manager.create_collection(scope_name, collection_name) - logging.info(f"Collection '{collection_name}' created successfully.") - else: - logging.info(f"Collection '{collection_name}' already exists. Skipping creation.") - - # Wait for collection to be ready - collection = bucket.scope(scope_name).collection(collection_name) - time.sleep(2) # Give the collection time to be ready for queries - - # Ensure primary index exists - try: - cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute() - logging.info("Primary index present or created successfully.") - except Exception as e: - logging.warning(f"Error creating primary index: {str(e)}") - - # Clear all documents in the collection - try: - query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`" - cluster.query(query).execute() - logging.info("All documents cleared from the collection.") - except Exception as e: - logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.") - - return collection - except Exception as e: - raise RuntimeError(f"Error setting up collection: {str(e)}") - -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME) -setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION) - -``` - - 2025-02-24 01:02:12,840 - INFO - Bucket 'vector-search-testing' exists. - 2025-02-24 01:02:15,328 - INFO - Collection 'voyage' already exists. Skipping creation. - 2025-02-24 01:02:18,539 - INFO - Primary index present or created successfully. - 2025-02-24 01:02:21,013 - INFO - All documents cleared from the collection. - 2025-02-24 01:02:21,014 - INFO - Bucket 'vector-search-testing' exists. - 2025-02-24 01:02:23,506 - INFO - Collection 'cache' already exists. Skipping creation. - 2025-02-24 01:02:26,647 - INFO - Primary index present or created successfully. - 2025-02-24 01:02:26,913 - INFO - All documents cleared from the collection. - - - - - - - - - -# Loading Couchbase Vector Search Index - -Semantic search requires an efficient way to retrieve relevant documents based on a user's query. This is where the Couchbase **Vector Search Index** comes into play. In this step, we load the Vector Search Index definition from a JSON file, which specifies how the index should be structured. This includes the fields to be indexed, the dimensions of the vectors, and other parameters that determine how the search engine processes queries based on vector similarity. - -This Voyage vector search index configuration requires specific default settings to function properly. This tutorial uses the bucket named `vector-search-testing` with the scope `shared` and collection `voyage`. The configuration is set up for vectors with exactly `1536 dimensions`, using dot product similarity and optimized for recall. If you want to use a different bucket, scope, or collection, you will need to modify the index configuration accordingly. - -For more information on creating a vector search index, please follow the [instructions](https://docs.couchbase.com/cloud/vector-search/create-vector-search-index-ui.html). - - - -```python -# If you are running this script locally (not in Google Colab), uncomment the following line -# and provide the path to your index definition file. - -# index_definition_path = '/path_to_your_index_file/voyage_index.json' # Local setup: specify your file path here - -# # Version for Google Colab -# def load_index_definition_colab(): -# from google.colab import files -# print("Upload your index definition file") -# uploaded = files.upload() -# index_definition_path = list(uploaded.keys())[0] - -# try: -# with open(index_definition_path, 'r') as file: -# index_definition = json.load(file) -# return index_definition -# except Exception as e: -# raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Version for Local Environment -def load_index_definition_local(index_definition_path): - try: - with open(index_definition_path, 'r') as file: - index_definition = json.load(file) - return index_definition - except Exception as e: - raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}") - -# Usage -# Uncomment the appropriate line based on your environment -# index_definition = load_index_definition_colab() -index_definition = load_index_definition_local('voyage_index.json') -``` - -# Creating or Updating Search Indexes - -With the index definition loaded, the next step is to create or update the **Vector Search Index** in Couchbase. This step is crucial because it optimizes our database for vector similarity search operations, allowing us to perform searches based on the semantic content of documents rather than just keywords. By creating or updating a Vector Search Index, we enable our search engine to handle complex queries that involve finding semantically similar documents using vector embeddings, which is essential for a robust semantic search engine. - - -```python -try: - scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes() - - # Check if index already exists - existing_indexes = scope_index_manager.get_all_indexes() - index_name = index_definition["name"] - - if index_name in [index.name for index in existing_indexes]: - logging.info(f"Index '{index_name}' found") - else: - logging.info(f"Creating new index '{index_name}'...") - - # Create SearchIndex object from JSON definition - search_index = SearchIndex.from_json(index_definition) - - # Upsert the index (create if not exists, update if exists) - scope_index_manager.upsert_index(search_index) - logging.info(f"Index '{index_name}' successfully created/updated.") - -except QueryIndexAlreadyExistsException: - logging.info(f"Index '{index_name}' already exists. Skipping creation/update.") -except ServiceUnavailableException: - raise RuntimeError("Search service is not available. Please ensure the Search service is enabled in your Couchbase cluster.") -except InternalServerFailureException as e: - logging.error(f"Internal server error: {str(e)}") - raise -``` - - 2025-04-21 13:43:33,489 - INFO - Index 'vector_search_voyage' found - 2025-04-21 13:43:33,505 - INFO - Index 'vector_search_voyage' already exists. Skipping creation/update. - - -# Create Embeddings -Embeddings are created using the Voyage API. Embeddings are vectors (arrays of numbers) that represent the meaning of text in a high-dimensional space. These embeddings are crucial for tasks like semantic search, where the goal is to find text that is semantically similar to a query. The script uses a pre-trained model provided by Voyage to generate embeddings for the text in the dataset. - - -```python -try: - embeddings = VoyageAIEmbeddings(voyage_api_key=VOYAGE_API_KEY,model="voyage-large-2") - logging.info("Successfully created VoyageAIEmbeddings") -except Exception as e: - raise ValueError(f"Error creating VoyageAIEmbeddings: {str(e)}") -``` - - 2025-02-24 01:02:29,797 - INFO - Successfully created VoyageAIEmbeddings - - -# Set Up Vector Store -The vector store is set up to manage the embeddings created in the previous step. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors. In this case, the vector store is built on top of Couchbase, allowing the script to store the embeddings in a way that can be efficiently searched. - - - -```python -try: - vector_store = CouchbaseSearchVectorStore( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=COLLECTION_NAME, - embedding=embeddings, - index_name=INDEX_NAME, - ) - logging.info("Successfully created vector store") -except Exception as e: - raise ValueError(f"Failed to create vector store: {str(e)}") -``` - - 2025-02-24 01:02:34,123 - INFO - Successfully created vector store - - -# Load the BBC News Dataset -To build a search engine, we need data to search through. We use the BBC News dataset from RealTimeData, which provides real-world news articles. This dataset contains news articles from BBC covering various topics and time periods. Loading the dataset is a crucial step because it provides the raw material that our search engine will work with. The quality and diversity of the news articles make it an excellent choice for testing and refining our search engine, ensuring it can handle real-world news content effectively. - -The BBC News dataset allows us to work with authentic news articles, enabling us to build and test a search engine that can effectively process and retrieve relevant news content. The dataset is loaded using the Hugging Face datasets library, specifically accessing the "RealTimeData/bbc_news_alltime" dataset with the "2024-12" version. - - -```python -try: - news_dataset = load_dataset( - "RealTimeData/bbc_news_alltime", "2024-12", split="train" - ) - print(f"Loaded the BBC News dataset with {len(news_dataset)} rows") - logging.info(f"Successfully loaded the BBC News dataset with {len(news_dataset)} rows.") -except Exception as e: - raise ValueError(f"Error loading the BBC News dataset: {str(e)}") -``` - - 2025-02-24 01:02:39,306 - INFO - Successfully loaded the BBC News dataset with 2687 rows. - - - Loaded the BBC News dataset with 2687 rows - - -## Cleaning up the Data -We will use the content of the news articles for our RAG system. - -The dataset contains a few duplicate records. We are removing them to avoid duplicate results in the retrieval stage of our RAG system. - - -```python -news_articles = news_dataset["content"] -unique_articles = set() -for article in news_articles: - if article: - unique_articles.add(article) -unique_news_articles = list(unique_articles) -print(f"We have {len(unique_news_articles)} unique articles in our database.") -``` - - We have 1749 unique articles in our database. - - -## Saving Data to the Vector Store -To efficiently handle the large number of articles, we process them in batches of articles at a time. This batch processing approach helps manage memory usage and provides better control over the ingestion process. - -We first filter out any articles that exceed 50,000 characters to avoid potential issues with token limits. Then, using the vector store's add_texts method, we add the filtered articles to our vector database. The batch_size parameter controls how many articles are processed in each iteration. - -This approach offers several benefits: -1. Memory Efficiency: Processing in smaller batches prevents memory overload -2. Error Handling: If an error occurs, only the current batch is affected -3. Progress Tracking: Easier to monitor and track the ingestion progress -4. Resource Management: Better control over CPU and network resource utilization - -We use a conservative batch size of 25 to ensure reliable operation. -The optimal batch size depends on many factors including: -- Document sizes being inserted -- Available system resources -- Network conditions -- Concurrent workload - -Consider measuring performance with your specific workload before adjusting. - - - -```python -batch_size = 25 - -# Automatic Batch Processing -articles = [article for article in unique_news_articles if article and len(article) <= 50000] - -try: - vector_store.add_texts( - texts=articles, - batch_size=batch_size - ) - logging.info("Document ingestion completed successfully.") -except Exception as e: - raise ValueError(f"Failed to save documents to vector store: {str(e)}") - -``` - - 2025-02-24 01:39:56,883 - INFO - Document ingestion completed successfully. - - -# Set Up Cache - A cache is set up using Couchbase to store intermediate results and frequently accessed data. Caching is important for improving performance, as it reduces the need to repeatedly calculate or retrieve the same data. The cache is linked to a specific collection in Couchbase, and it is used later in the script to store the results of language model queries. - - - -```python -try: - cache = CouchbaseCache( - cluster=cluster, - bucket_name=CB_BUCKET_NAME, - scope_name=SCOPE_NAME, - collection_name=CACHE_COLLECTION, - ) - logging.info("Successfully created cache") - set_llm_cache(cache) -except Exception as e: - raise ValueError(f"Failed to create cache: {str(e)}") -``` - - 2025-02-24 01:39:59,753 - INFO - Successfully created cache - - -# Create Language Model (LLM) -The script initializes a Cohere language model (LLM) that will be used for generating responses to queries. LLMs are powerful tools for natural language understanding and generation, capable of producing human-like text based on input prompts. The model is configured with specific parameters, such as the temperature, which controls the randomness of its outputs. - - - -```python -try: - llm = ChatOpenAI( - openai_api_key=OPENAI_API_KEY, - model="gpt-4o-2024-08-06", - temperature=0 - ) - logging.info(f"Successfully created OpenAI LLM with model gpt-4o-2024-08-06") -except Exception as e: - raise ValueError(f"Error creating OpenAI LLM: {str(e)}") -``` - - 2025-02-24 01:39:59,846 - INFO - Successfully created OpenAI LLM with model gpt-4o-2024-08-06 - - -# Perform Semantic Search -Semantic search in Couchbase involves converting queries and documents into vector representations using an embeddings model. These vectors capture the semantic meaning of the text and are stored directly in Couchbase. When a query is made, Couchbase performs a similarity search by comparing the query vector against the stored document vectors. The similarity metric used for this comparison is configurable, allowing flexibility in how the relevance of documents is determined. - -In the provided code, the search process begins by recording the start time, followed by executing the similarity_search_with_score method of the CouchbaseSearchVectorStore. This method searches Couchbase for the most relevant documents based on the vector similarity to the query. The search results include the document content and a similarity score that reflects how closely each document aligns with the query in the defined semantic space. The time taken to perform this search is then calculated and logged, and the results are displayed, showing the most relevant documents along with their similarity scores. This approach leverages Couchbase as both a storage and retrieval engine for vector data, enabling efficient and scalable semantic searches. The integration of vector storage and search capabilities within Couchbase allows for sophisticated semantic search operations without relying on external services for vector storage or comparison. - - -```python -query = "What was manchester city manager pep guardiola's reaction to the team's current form?" - -try: - # Perform the semantic search - start_time = time.time() - search_results = vector_store.similarity_search_with_score(query, k=10) - search_elapsed_time = time.time() - start_time - - logging.info(f"Semantic search completed in {search_elapsed_time:.2f} seconds") - - # Display search results - print(f"\nSemantic Search Results (completed in {search_elapsed_time:.2f} seconds):") - print("-" * 80) # Add separator line - for doc, score in search_results: - print(f"Score: {score:.4f}, Text: {doc.page_content}") - print("-" * 80) # Add separator between results - -except CouchbaseException as e: - raise RuntimeError(f"Error performing semantic search: {str(e)}") -except Exception as e: - raise RuntimeError(f"Unexpected error: {str(e)}") -``` - - 2025-02-24 01:40:02,318 - INFO - Semantic search completed in 2.46 seconds - - - - Semantic Search Results (completed in 2.46 seconds): - -------------------------------------------------------------------------------- - Score: 0.7965, Text: 'Self-doubt, errors & big changes' - inside the crisis at Man City - - Pep Guardiola has not been through a moment like this in his managerial career. Manchester City have lost nine matches in their past 12 - as many defeats as they had suffered in their previous 106 fixtures. At the end of October, City were still unbeaten at the top of the Premier League and favourites to win a fifth successive title. Now they are seventh, 12 points behind leaders Liverpool having played a game more. It has been an incredible fall from grace and left people trying to work out what has happened - and whether Guardiola can make it right. After discussing the situation with those who know him best, I have taken a closer look at the future - both short and long term - and how the current crisis at Man City is going to be solved. - - Pep Guardiola's Man City have lost nine of their past 12 matches - - Guardiola has also been giving it a lot of thought. He has not been sleeping very well, as he has said, and has not been himself at times when talking to the media. He has been talking to a lot of people about what is going on as he tries to work out the reasons for City's demise. Some reasons he knows, others he still doesn't. What people perhaps do not realise is Guardiola hugely doubts himself and always has. He will be thinking "I'm not going to be able to get us out of this" and needs the support of people close to him to push away those insecurities - and he has that. He is protected by his people who are very aware, like he is, that there are a lot of people that want City to fail. It has been a turbulent time for Guardiola. Remember those marks he had on his head after the 3-3 draw with Feyenoord in the Champions League? He always scratches his head, it is a gesture of nervousness. Normally nothing happens but on that day one of his nails was far too sharp so, after talking to the players in the changing room where he scratched his head because of his usual agitated gesturing, he went to the news conference. His right-hand man Manel Estiarte sent him photos in a message saying "what have you got on your head?", but by the time Guardiola returned to the coaching room there was hardly anything there again. He started that day with a cover on his nose after the same thing happened at the training ground the day before. Guardiola was having a footballing debate with Kyle Walker about positional stuff and marked his nose with that same nail. There was also that remarkable news conference after the Manchester derby when he said "I don't know what to do". That is partly true and partly not true. Ignore the fact Guardiola suggested he was "not good enough". He actually meant he was not good enough to resolve the situation with the group of players he has available and with all the other current difficulties. There are obviously logical explanations for the crisis and the first one has been talked about many times - the absence of injured midfielder Rodri. You know the game Jenga? When you take the wrong piece out, the whole tower collapses. That is what has happened here. It is normal for teams to have an over-reliance on one player if he is the best in the world in his position. And you cannot calculate the consequences of an injury that rules someone like Rodri out for the season. City are a team, like many modern ones, in which the holding midfielder is a key element to the construction. So, when you take Rodri out, it is difficult to hold it together. There were Plan Bs - John Stones, Manuel Akanji, even Nathan Ake - but injuries struck. The big injury list has been out of the ordinary and the busy calendar has also played a part in compounding the issues. However, one factor even Guardiola cannot explain is the big uncharacteristic errors in almost every game from international players. Why did Matheus Nunes make that challenge to give away the penalty against Manchester United? Jack Grealish is sent on at the end to keep the ball and cannot do that. There are errors from Walker and other defenders. These are some of the best players in the world. Of course the players' mindset is important, and confidence is diminishing. Wrong decisions get taken so there is almost panic on the pitch instead of calm. There are also players badly out of form who are having to play because of injuries. Walker is now unable to hide behind his pace, I'm not sure Kevin de Bruyne is ever getting back to the level he used to be at, Bernardo Silva and Ilkay Gundogan do not have time to rest, Grealish is not playing at his best. Some of these players were only meant to be playing one game a week but, because of injuries, have played 12 games in 40 days. It all has a domino effect. One consequence is that Erling Haaland isn't getting the service to score. But the Norwegian still remains City's top-scorer with 13. Defender Josko Gvardiol is next on the list with just four. The way their form has been analysed inside the City camp is there have only been three games where they deserved to lose (Liverpool, Bournemouth and Aston Villa). But of course it is time to change the dynamic. - - Guardiola has never protected his players so much. He has not criticised them and is not going to do so. They have won everything with him. Instead of doing more with them, he has tried doing less. He has sometimes given them more days off to clear their heads, so they can reset - two days this week for instance. Perhaps the time to change a team is when you are winning, but no-one was suggesting Man City were about to collapse when they were top and unbeaten after nine league games. Some people have asked how bad it has to get before City make a decision on Guardiola. The answer is that there is no decision to be made. Maybe if this was Real Madrid, Barcelona or Juventus, the pressure from outside would be massive and the argument would be made that Guardiola has to go. At City he has won the lot, so how can anyone say he is failing? Yes, this is a crisis. But given all their problems, City's renewed target is finishing in the top four. That is what is in all their heads now. The idea is to recover their essence by improving defensive concepts that are not there and re-establishing the intensity they are known for. Guardiola is planning to use the next two years of his contract, which is expected to be his last as a club manager, to prepare a new Manchester City. When he was at the end of his four years at Barcelona, he asked two managers what to do when you feel people are not responding to your instructions. Do you go or do the players go? Sir Alex Ferguson and Rafael Benitez both told him that the players need to go. Guardiola did not listen because of his emotional attachment to his players back then and he decided to leave the Camp Nou because he felt the cycle was over. He will still protect his players now but there is not the same emotional attachment - so it is the players who are going to leave this time. It is likely City will look to replace five or six regular starters. Guardiola knows it is the end of an era and the start of a new one. Changes will not be immediate and the majority of the work will be done in the summer. But they are open to any opportunities in January - and a holding midfielder is one thing they need. In the summer City might want to get Spain's Martin Zubimendi from Real Sociedad and they know 60m euros (£50m) will get him. He said no to Liverpool last summer even though everything was agreed, but he now wants to move on and the Premier League is the target. Even if they do not get Zubimendi, that is the calibre of footballer they are after. A new Manchester City is on its way - with changes driven by Guardiola, incoming sporting director Hugo Viana and the football department. - -------------------------------------------------------------------------------- - Score: 0.7948, Text: Manchester City boss Pep Guardiola has won 18 trophies since he arrived at the club in 2016 - - Manchester City boss Pep Guardiola says he is "fine" despite admitting his sleep and diet are being affected by the worst run of results in his entire managerial career. In an interview with former Italy international Luca Toni for Amazon Prime Sport before Wednesday's Champions League defeat by Juventus, Guardiola touched on the personal impact City's sudden downturn in form has had. Guardiola said his state of mind was "ugly", that his sleep was "worse" and he was eating lighter as his digestion had suffered. City go into Sunday's derby against Manchester United at Etihad Stadium having won just one of their past 10 games. The Juventus loss means there is a chance they may not even secure a play-off spot in the Champions League. Asked to elaborate on his comments to Toni, Guardiola said: "I'm fine. "In our jobs we always want to do our best or the best as possible. When that doesn't happen you are more uncomfortable than when the situation is going well, always that happened. "In good moments I am happier but when I get to the next game I am still concerned about what I have to do. There is no human being that makes an activity and it doesn't matter how they do." Guardiola said City have to defend better and "avoid making mistakes at both ends". To emphasise his point, Guardiola referred back to the third game of City's current run, against a Sporting side managed by Ruben Amorim, who will be in the United dugout at the weekend. City dominated the first half in Lisbon, led thanks to Phil Foden's early effort and looked to be cruising. Instead, they conceded three times in 11 minutes either side of half-time as Sporting eventually ran out 4-1 winners. "I would like to play the game like we played in Lisbon on Sunday, believe me," said Guardiola, who is facing the prospect of only having three fit defenders for the derby as Nathan Ake and Manuel Akanji try to overcome injury concerns. If there is solace for City, it comes from the knowledge United are not exactly flying. Their comeback Europa League victory against Viktoria Plzen on Thursday was their third win of Amorim's short reign so far but only one of those successes has come in the Premier League, where United have lost their past two games against Arsenal and Nottingham Forest. Nevertheless, Guardiola can see improvements already on the red side of the city. "It's already there," he said. "You see all the patterns, the movements, the runners and the pace. He will do a good job at United, I'm pretty sure of that." - - Guardiola says skipper Kyle Walker has been offered support by the club after the City defender highlighted the racial abuse he had received on social media in the wake of the Juventus trip. "It's unacceptable," he said. "Not because it's Kyle - for any human being. "Unfortunately it happens many times in the real world. It is not necessary to say he has the support of the entire club. It is completely unacceptable and we give our support to him." - -------------------------------------------------------------------------------- - Score: 0.7755, Text: Pep Guardiola has said Manchester City will be his final managerial job in club football before he "maybe" coaches a national team. - - The former Barcelona and Bayern Munich boss has won 15 major trophies since taking charge of City in 2016. - - The 53-year-old Spaniard was approached in the summer about the possibility of becoming England manager, but last month signed a two-year contract extension with City until 2027. - - - ... (output truncated for brevity) - - -# Retrieval-Augmented Generation (RAG) with Couchbase and Langchain -Couchbase and LangChain can be seamlessly integrated to create RAG (Retrieval-Augmented Generation) chains, enhancing the process of generating contextually relevant responses. In this setup, Couchbase serves as the vector store, where embeddings of documents are stored. When a query is made, LangChain retrieves the most relevant documents from Couchbase by comparing the query’s embedding with the stored document embeddings. These documents, which provide contextual information, are then passed to a generative language model within LangChain. - -The language model, equipped with the context from the retrieved documents, generates a response that is both informed and contextually accurate. This integration allows the RAG chain to leverage Couchbase’s efficient storage and retrieval capabilities, while LangChain handles the generation of responses based on the context provided by the retrieved documents. Together, they create a powerful system that can deliver highly relevant and accurate answers by combining the strengths of both retrieval and generation. - - -```python -try: - template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below: - {context} - Question: {question}""" - prompt = ChatPromptTemplate.from_template(template) - rag_chain = ( - {"context": vector_store.as_retriever(), "question": RunnablePassthrough()} - | prompt - | llm - | StrOutputParser() - ) - logging.info("Successfully created RAG chain") -except Exception as e: - raise ValueError(f"Error creating LLM chains: {str(e)}") -``` - - 2025-02-24 01:40:02,392 - INFO - Successfully created RAG chain - - - -```python -try: - # Get RAG response - start_time = time.time() - rag_response = rag_chain.invoke(query) - rag_elapsed_time = time.time() - start_time - - print(f"RAG Response: {rag_response}") - print(f"RAG response generated in {rag_elapsed_time:.2f} seconds") - -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - RAG Response: Pep Guardiola has expressed concern about Manchester City's current form, describing his state of mind as "ugly" and admitting that his sleep and diet have been affected. He acknowledged the team's poor run of results and emphasized the need to defend better and avoid mistakes. Despite the challenges, Guardiola remains calm and focused on finding solutions, expressing trust in his players and a determination to return to form. He has not criticized his players publicly and has instead offered them support, giving them more days off to reset. Guardiola is planning for the future, acknowledging the end of an era and the need for changes in the team. - RAG response generated in 7.56 seconds - - -# Using Couchbase as a caching mechanism -Couchbase can be effectively used as a caching mechanism for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This approach enhances the system's efficiency and speed, particularly when dealing with repeated or similar queries. When a query is first processed, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase, with the query serving as the key. - -For subsequent requests with the same query, the system checks Couchbase first. If a cached response is found, it is retrieved directly from Couchbase, bypassing the need to re-run the entire RAG process. This significantly reduces response time because the computationally expensive steps of document retrieval and response generation are skipped. Couchbase's role in this setup is to provide a fast and scalable storage solution for caching these responses, ensuring that frequently asked queries can be answered more quickly and efficiently. - - -```python -try: - queries = [ - "What happened in the match between Fullham and Liverpool?", - "What was manchester city manager pep guardiola's reaction to the team's current form?", # Repeated query - "What happened in the match between Fullham and Liverpool?", # Repeated query - ] - - for i, query in enumerate(queries, 1): - print(f"\nQuery {i}: {query}") - start_time = time.time() - - response = rag_chain.invoke(query) - elapsed_time = time.time() - start_time - print(f"Response: {response}") - print(f"Time taken: {elapsed_time:.2f} seconds") -except InternalServerFailureException as e: - if "query request rejected" in str(e): - print("Error: Search request was rejected due to rate limiting. Please try again later.") - else: - print(f"Internal server error occurred: {str(e)}") -except Exception as e: - print(f"Unexpected error occurred: {str(e)}") -``` - - - Query 1: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, the game ended in a 2-2 draw. Liverpool played the majority of the match with ten men after Andy Robertson received a red card in the 17th minute. Despite being a player down, Liverpool managed to equalize twice, with Diogo Jota scoring an 86th-minute equalizer. The performance was praised as impressive, with Liverpool maintaining more than 60% possession and leading in several attacking metrics. - Time taken: 6.54 seconds - - Query 2: What was manchester city manager pep guardiola's reaction to the team's current form? - Response: Pep Guardiola has expressed concern about Manchester City's current form, describing his state of mind as "ugly" and admitting that his sleep and diet have been affected. He acknowledged the team's poor run of results and emphasized the need to defend better and avoid mistakes. Despite the challenges, Guardiola remains calm and focused on finding solutions, expressing trust in his players and a determination to return to form. He has not criticized his players publicly and has instead offered them support, giving them more days off to reset. Guardiola is planning for the future, acknowledging the end of an era and the need for changes in the team. - Time taken: 1.98 seconds - - Query 3: What happened in the match between Fullham and Liverpool? - Response: In the match between Fulham and Liverpool, the game ended in a 2-2 draw. Liverpool played the majority of the match with ten men after Andy Robertson received a red card in the 17th minute. Despite being a player down, Liverpool managed to equalize twice, with Diogo Jota scoring an 86th-minute equalizer. The performance was praised as impressive, with Liverpool maintaining more than 60% possession and leading in several attacking metrics. - Time taken: 1.85 seconds - - -## Conclusion -By following these steps, you’ll have a fully functional semantic search engine that leverages the strengths of Couchbase and Voyage. This guide is designed not just to show you how to build the system, but also to explain why each step is necessary, giving you a deeper understanding of the principles behind semantic search and how to implement it effectively. Whether you’re a newcomer to software development or an experienced developer looking to expand your skills, this guide will provide you with the knowledge and tools you need to create a powerful, AI-driven search engine. diff --git a/tutorial/markdown/generated/vector-search-cookbook/agentchat_RetrieveChat_couchbase.md b/tutorial/markdown/generated/vector-search-cookbook/agentchat_RetrieveChat_couchbase.md deleted file mode 100644 index 21719e6..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/agentchat_RetrieveChat_couchbase.md +++ /dev/null @@ -1,562 +0,0 @@ ---- -# frontmatter -path: "/tutorial-couchbase-ag2-rag" -title: Build an Agentic RAG Application with Couchbase and AG2 -short_title: RAG with Couchbase & AG2 -description: - - Learn how Couchbase and AG2 simplify RAG applications - - Store and retrieve document embeddings with Couchbase Vector Search - - Build an AI agent that answers questions from documentation links -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - Autogen - - Ag2 -sdk_language: - - python -length: 40 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/ag2/agentchat_RetrieveChat_couchbase.ipynb) - -# Using RetrieveChat Powered by Couchbase Capella for Retrieve Augmented Code Generation and Question Answering - -This tutorial will show you how we've made building Retrieval-Augmented Generation (RAG) applications much easier with [Couchbase](https://www.couchbase.com/) and [AG2](https://ag2.ai/). By leveraging [Couchbase's Search vector index](https://docs.couchbase.com/cloud/vector-search/vector-search.html) for storing and retrieving document embeddings, along with [AG2's powerful AI capabilities](https://docs.ag2.ai/docs/user-guide/basic-concepts/installing-ag2), our integration simplifies the entire process. As part of this tutorial, we'll also build a demo application where an AI agent can answer questions based on documentation links provided for any framework. This hands-on approach will demonstrate how effortlessly you can create intelligent, context-aware AI applications using this integration. - -RetrieveChat is a conversational system for retrieval-augmented code generation and question answering. In this notebook, we demonstrate how to utilize RetrieveChat to generate code and answer questions based on customized documentations that are not present in the LLM's training dataset. RetrieveChat uses the `AssistantAgent` and `RetrieveUserProxyAgent`, which is similar to the usage of `AssistantAgent` and `UserProxyAgent` in other notebooks (e.g., [Automated Task Solving with Code Generation, Execution & Debugging](https://docs.ag2.ai/docs/use-cases/notebooks/notebooks/agentchat_auto_feedback_from_code_execution)). Essentially, `RetrieveUserProxyAgent` implement a different auto-reply mechanism corresponding to the RetrieveChat prompts. - -Some extra dependencies are needed for this notebook, which can be installed via pip - - -```python -%pip install "pyautogen[openai,retrievechat-couchbase]==0.8.7" "flaml[automl]==2.3.4" couchbase==4.3.3 -# For more information, please refer to the [installation guide](/docs/installation/). -``` - -## Environment Setup - -# Couchbase Capella Setup Instructions - -Before we proceed with the notebook, we will require a Couchbase Capella Database Cluster running. - -## Setting Up a Free Cluster -- To set up a free operational cluster, head over to [Couchbase Cloud](https://cloud.couchbase.com) and create an account. There, create a free cluster. For more details on creating a cluster, [refer here](https://docs.couchbase.com/cloud/get-started/create-account.html). - -## Creating Required Resources -- After creating the cluster, we need to create our required bucket, scope, and collections. Head over to **Data Tools**. On the left-hand side panel, you will find an option to create a bucket. Assign appropriate names for the Bucket, Scope, and Collection. For this tutorial, use the following: - - **Bucket Name**: `new_bucket` - - **Scope Name**: `new_scope` - - **Collection Name**: `new_collection` - - **Vector SearchIndex Name**: `vector_index` - -## Creating a Search Index -Before proceeding further, we need to set up a search index for vector-based retrieval. This is essential for efficient querying in our RAG pipeline. Follow the steps below: - - - [Couchbase Capella](https://docs.couchbase.com/cloud/search/import-search-index.html) - - Copy the index definition below to a new file index.json - - Import the file in Capella using the instructions in the documentation. - - Click on Create Index to create the index. - -- [Couchbase Server](https://docs.couchbase.com/server/current/search/import-search-index.html) - - Click on Search -> Add Index -> Import - - Copy the following index definition in the Import screen - - Click on Create Index to create the index. - -#### Index Definition - -`bucket`, `scope`, `collection` === `sample_bucekt`, `sample_scope`, `sample_collection` - -```json -{ - "name": "vector_index", - "type": "fulltext-index", - "params": { - "doc_config": { - "docid_prefix_delim": "", - "docid_regexp": "", - "mode": "scope.collection.type_field", - "type_field": "type" - }, - "mapping": { - "default_analyzer": "standard", - "default_datetime_parser": "dateTimeOptional", - "default_field": "_all", - "default_mapping": { - "dynamic": true, - "enabled": false - }, - "default_type": "_default", - "docvalues_dynamic": false, - "index_dynamic": true, - "store_dynamic": false, - "type_field": "_type", - "types": { - "sample_scope.sample_collection": { - "dynamic": true, - "enabled": true, - "properties": { - "embedding": { - "enabled": true, - "dynamic": false, - "fields": [ - { - "dims": 384, - "index": true, - "name": "embedding", - "similarity": "dot_product", - "type": "vector", - "vector_index_optimized_for": "recall" - } - ] - }, - "content": { - "enabled": true, - "dynamic": false, - "fields": [ - { - "index": true, - "name": "content", - "store": true, - "type": "text" - } - ] - } - } - } - } - }, - "store": { - "indexType": "scorch", - "segmentVersion": 16 - } - }, - "sourceType": "gocbcore", - "sourceName": "sample_bucket", - "sourceParams": {}, - "planParams": { - "maxPartitionsPerPIndex": 64, - "indexPartitions": 16, - "numReplicas": 0 - } -} -``` - -## Connecting to the Cluster -- Now, we will connect to the cluster. [Refer to this page for connection details](https://docs.couchbase.com/cloud/get-started/connect.html). - -- **Create a user to connect:** - - Navigate to the **Settings** tab. - - Click **Create Cluster Access** and specify a username and password. - - Assign **read/write access to all buckets** (you may create more users with restricted permissions as needed). - - For more details, [refer here](https://docs.couchbase.com/cloud/clusters/manage-database-users.html#create-database-credentials). - -- **Add an IP Address to the allowed list:** - - In **Settings**, click on **Networking**. - - Add an [allowed IP](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) based on your requirements. - -- **Set up environment variables:** - Retrieve the connection string from the **Connect** tab. Then, configure the following environment variables: - - `CB_CONN_STR`: Couchbase Cluster Connection String - - `CB_USERNAME`: Username of the created user - - `CB_PASSWORD`: Password of the created user - - `OPENAI_API_KEY`: OpenAI API Key (required for agents) - - -```python -import os -# Environment Variables -os.environ["CB_CONN_STR"] = "<>" -os.environ["CB_USERNAME"] = "<>" -os.environ["CB_PASSWORD"] = "<>" -os.environ["OPENAI_API_KEY"] = "<>" - -# you can chge the ones below, but then you will have to change these in the vector search index created in the couchbase cluster as well. -os.environ["CB_BUCKET"] = "sample_bucket" -os.environ["CB_SCOPE"] = "sample_scope" -os.environ["CB_COLLECTION"] = "sample_collection" -os.environ["CB_INDEX_NAME"] = "vector_index" -``` - -**Voila! Your cluster is now ready to be used.** - -## Initializing Agents - -We start by initializing the `AssistantAgent` and `RetrieveUserProxyAgent`. The system message needs to be set to "You are a helpful assistant." for AssistantAgent. The detailed instructions are given in the user message. Later we will use the `RetrieveUserProxyAgent.message_generator` to combine the instructions and a retrieval augmented generation task for an initial prompt to be sent to the LLM assistant. - - -```python -import os -import sys - -from autogen import AssistantAgent - -sys.path.append(os.path.abspath("/workspaces/autogen/autogen/agentchat/contrib")) - -from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent - -# Accepted file formats that can be stored in -# a vector database instance -from autogen.retrieve_utils import TEXT_FORMATS - -config_list = [{"model": "gpt-4o", "api_key": os.environ["OPENAI_API_KEY"], "api_type": "openai"}] -assert len(config_list) > 0 -print("models to use: ", [config_list[i]["model"] for i in range(len(config_list))]) -``` - - -```python -print("Accepted file formats for `docs_path`:") -print(TEXT_FORMATS) -``` - -### Understanding `AssistantAgent` in AutoGen - -The `AssistantAgent` in AutoGen is a specialized subclass of `ConversableAgent` designed to perform tasks using large language models (LLMs). By default, it generates code suggestions and debugging assistance but does not execute code autonomously; it relies on user intervention for code execution. - -**Key Components of the `AssistantAgent` Initialization:** - -- **`name`**: Assigns a unique identifier to the agent. - -- **`system_message`**: Sets the default behavior and role of the agent. In this case, it's initialized with "You are a helpful assistant," guiding the agent to provide assistance aligned with this directive. - -- **`llm_config`**: Configures the LLM's behavior with parameters like timeout settings, caching mechanisms, and a list of model configurations (`config_list`). - -**How `AssistantAgent` Operates:** - -Once initialized, the `AssistantAgent` can interact with users or other agents to process tasks. It leverages the specified LLM configurations to generate responses, code snippets, or debugging advice based on the input it receives. However, it does not execute code by default, awaiting user approval or execution commands. - -For more detailed information, refer to the official AG2 documentation on [`AssistantAgent`](https://docs.ag2.ai/docs/api-reference/autogen/AssistantAgent). - -### Implementing `AssistantAgent` for LLM-Powered Assistance - -The provided code snippet demonstrates the creation of an `AssistantAgent` instance named "assistant" using the AutoGen framework. The `AssistantAgent` class is designed to interact with large language models (LLMs) to solve tasks, including suggesting Python code blocks and debugging. By default, it does not execute code and expects the user to handle code execution. - -- **`name="assistant"`**: Assigns the name "assistant" to the agent. - -- **`system_message="You are a helpful assistant."`**: Sets a system message that defines the assistant's role and behavior during interactions. - -- **`llm_config={...}`**: Provides configuration settings for the LLM: - - **`timeout=600`**: Specifies a timeout of 600 seconds for LLM responses. - - **`cache_seed=42`**: Sets a seed value for caching mechanisms to ensure consistent results. - - **`config_list=config_list`**: Includes a list of additional configurations, which can define specific LLM models or parameters to use. - -By default, the `AssistantAgent` has `human_input_mode` set to "NEVER" and `code_execution_config` set to `False`, meaning it doesn't execute code and doesn't require human input during interactions. - - -```python -# 1. create an AssistantAgent instance named "assistant" -assistant = AssistantAgent( # As defined above - name="assistant", - system_message="You are a helpful assistant.", - llm_config={ - "timeout": 600, - "cache_seed": 42, - "config_list": config_list, - }, -) -print("AssistantAgent instance created, with the configurations as defined above.") -``` - -## Fetching Documentation - -The following function recursively fetches all unique internal links from the given documentation URL within a specified time limit. This is useful for gathering documentation pages that will be used to augment the LLM's responses. - - -```python -import requests -from bs4 import BeautifulSoup -from urllib.parse import urljoin, urlparse -import time -import os -from concurrent.futures import ThreadPoolExecutor, as_completed - -def get_documentation_links(base_url, visited=None, start_time=None, time_limit=10): - """ - Recursively fetch all unique internal links from the given documentation URL with a time constraint. - - Args: - base_url (str): The URL of the documentation homepage. - visited (set): A set to keep track of visited URLs. - start_time (float): The start time of execution. - time_limit (int): The maximum time allowed for execution in seconds. - - Returns: - list: A list of unique internal links found in the documentation. - """ - if visited is None: - visited = set() - if start_time is None: - start_time = time.time() - - # Stop recursion if time limit is exceeded - if time.time() - start_time > time_limit: - return list(visited) - - try: - response = requests.get(base_url, timeout=5) - response.raise_for_status() - except requests.RequestException as e: - print(f"Error fetching the page: {e}") - return list(visited) - - soup = BeautifulSoup(response.text, "html.parser") - domain = urlparse(base_url).netloc - - links = set() - for a_tag in soup.find_all("a", href=True): - href = a_tag["href"].strip() - full_url = urljoin(base_url, href) - parsed_url = urlparse(full_url) - - if parsed_url.netloc == domain and full_url not in visited: # Ensure it's an internal link within the same domain - visited.add(full_url) - links.add(full_url) - links.update(get_documentation_links(full_url, visited, start_time, time_limit)) # Recursive call with time check - - return list(visited) - -``` - - -```python -def fetch_content_generators(links, num_workers=5): - """ - Splits the links into separate lists for each worker and returns generators for each worker. - Extracts only plain text from HTML before yielding. - - Args: - links (list): List of URLs to fetch content from. - num_workers (int): Number of workers, each receiving a distinct set of links. - - Returns: - list: A list of generators, one for each worker. - """ - def fetch_content(sub_links): - for link in sub_links: - try: - response = requests.get(link, timeout=5) - response.raise_for_status() - - # Extract plain text from HTML - soup = BeautifulSoup(response.text, "html.parser") - text_content = soup.get_text() - - yield link, text_content - except requests.RequestException as e: - print(f"Error fetching {link}: {e}") - yield link, None - - # Split links into chunks for each worker - chunk_size = (len(links) + num_workers - 1) // num_workers # Ensure even distribution - link_chunks = [links[i:i + chunk_size] for i in range(0, len(links), chunk_size)] - - return [fetch_content(chunk) for chunk in link_chunks] -``` - - -```python -def save_content_to_files(links, output_folder="docs_data", num_workers=5): - """ - Uses fetch_content_generators to fetch content in parallel and save it to local files. - - Args: - links (list): List of URLs to fetch content from. - output_folder (str): Folder to store the saved text files. - num_workers (int): Number of workers for parallel processing. - - Returns: - list: A list of file paths where content is saved. - """ - os.makedirs(output_folder, exist_ok=True) - generators = fetch_content_generators(links, num_workers=num_workers) - - file_paths = [] - - def process_and_save(gen, worker_id): - local_paths = [] - for j, (url, content) in enumerate(gen): - if content: # Avoid saving empty or failed fetches - file_name = f"doc_{worker_id}_{j}.txt" - file_path = os.path.join(output_folder, file_name) - with open(file_path, "w", encoding="utf-8") as f: - f.write(content) - local_paths.append(file_path) - return local_paths - - with ThreadPoolExecutor(max_workers=num_workers) as executor: - futures = {executor.submit(process_and_save, gen, i): i for i, gen in enumerate(generators)} - for future in as_completed(futures): - file_paths.extend(future.result()) - - return file_paths -``` - -### 📌 Input Documentation Link Here -Please enter the link to the documentation below. - - -```python -default_link = "https://docs.ag2.ai/docs/use-cases/notebooks/Notebooks" -main_doc_link = input(f"Enter documentation link: ") or default_link -print("Selected link:", main_doc_link) -``` - - -```python -docs_links = get_documentation_links(main_doc_link, None, None, 5) -docs_file_paths = save_content_to_files(docs_links, "./docs", 12) -``` - - -```python -len(docs_file_paths), len(docs_links) -``` - - - - - (454, 454) - - - -## Using RetrieveChat - -The `RetrieveUserProxyAgent` in AutoGen is a specialized agent designed to facilitate retrieval-augmented generation (RAG) by leveraging external knowledge sources, typically a vector database. It acts as an intermediary between the user and an AI assistant, ensuring that relevant context is retrieved and supplied to the assistant for more informed responses. - - - -### **How RetrieveUserProxyAgent Works** - - -1. **Query Processing & Context Retrieval** - When the user submits a question, the `RetrieveUserProxyAgent` first determines if the available context is sufficient. If not, it retrieves additional relevant information from an external knowledge base (e.g., a vector database) using similarity search. - -2. **Interaction with the Assistant** - Once the relevant context is retrieved, the agent forwards both the user's query and the retrieved context to the `AssistantAgent` (such as an OpenAI-based model). This step ensures that the assistant generates an informed and contextually accurate response. - -3. **Handling Responses** - - If the assistant's response satisfies the user, the conversation ends. - - If the response is unsatisfactory or additional context is needed, the agent updates the context and repeats the retrieval process. - -4. **User Feedback & Iteration** - - The user can provide feedback, request refinements, or terminate the interaction. - - If updates are needed, the agent refines the context and interacts with the assistant again. - -![Retrival-Augmented Assistant](https://microsoft.github.io/autogen/0.2/assets/images/retrievechat-arch-959e180405c99ceb3da88a441c02f45e.png) - -Source: [Retrieval-Augmented Generation (RAG) Applications with AutoGen](https://microsoft.github.io/autogen/0.2/blog/2023/10/18/RetrieveChat/) - -### **Configuring `RetrieveUserProxyAgent` with Custom Text Splitting and OpenAI Embeddings for RAG** - -This code snippet demonstrates how to configure a `RetrieveUserProxyAgent` in AutoGen with a custom text splitter and an OpenAI-based embedding function for retrieval-augmented generation (RAG). It utilizes `RecursiveCharacterTextSplitter` to break documents into structured chunks for better embedding and retrieval. - -The embedding function is set up using OpenAI's `text-embedding-3-small` model, but users can alternatively use the default **SentenceTransformers** embedding model. The `RetrieveUserProxyAgent` is then initialized with a predefined task, auto-reply constraints, and a document retrieval path, enabling it to fetch relevant context dynamically and generate accurate responses in an automated workflow. - - -```python -from chromadb.utils import embedding_functions -from langchain.text_splitter import RecursiveCharacterTextSplitter - -# Initialize a recursive character text splitter with specified separators -recur_spliter = RecursiveCharacterTextSplitter(separators=["\n", "\r", "\t"]) - -# Option 1: Using OpenAI Embeddings -openai_ef = embedding_functions.OpenAIEmbeddingFunction( - api_key=os.environ["OPENAI_API_KEY"], - model_name="text-embedding-3-small", -) - -ragproxyagent = RetrieveUserProxyAgent( - name="ragproxyagent", - human_input_mode="NEVER", - max_consecutive_auto_reply=2, - retrieve_config={ - "task": "code", - "docs_path": docs_file_paths, - "chunk_token_size": 1200, # Defines chunk size for document splitting - "model": config_list[0]["model"], - "vector_db": "couchbase", # Using Couchbase Capella VectorDB - "collection_name": os.environ["CB_COLLECTION"] , # Collection name in Couchbase - "db_config": { - "connection_string": os.environ["CB_CONN_STR"], # Connection string for Couchbase - "username": os.environ["CB_USERNAME"], # Couchbase username - "password": os.environ["CB_PASSWORD"], # Couchbase password - "bucket_name": os.environ["CB_BUCKET"], # Bucket name in Couchbase - "scope_name": os.environ["CB_SCOPE"], # Scope name in Couchbase - "index_name": os.environ["CB_INDEX_NAME"], # Index name in Couchbase - }, - "get_or_create": True, # Set to False to avoid reusing an existing collection - "overwrite": False, # Set to True to overwrite an existing collection (forces index recreation) - - # Option 1: Use OpenAI embedding function (Uncomment below to enable) - # "embedding_function": openai_ef, - - # Option 2: Default embedding model (SentenceTransformers 'all-mpnet-base-v2') - "embedding_model": "all-mpnet-base-v2", # Default model if OpenAI embeddings are not used - - # Custom text splitter function - "custom_text_split_function": recur_spliter.split_text, - }, - code_execution_config=False, # Set to True if you want to execute retrieved code -) -``` - -## Chat Interaction - -This section marks the beginning of the chat interaction using RetrieveChat powered by Couchbase Capella for Retrieve Augmented Code Generation and Question Answering in AG2. - -### Example 1 - -Use RetrieveChat to help generate sample code and automatically run the code and fix errors if there is any. - -Problem: How to use RetrieveChat Powered by Couchbase Capella for Retrieve Augmented Code Generation and Question Answering in AG2? - -Note: You may need to create an index on the cluster to query - - -```python -assistant.reset() -code_problem = "How to use RetrieveChat Powered by Couchbase Capella for Retrieve Augmented Code Generation and Question Answering in AG2?" -chat_result = ragproxyagent.initiate_chat(assistant, message=ragproxyagent.message_generator, problem=code_problem) -``` - -**Expected Output** - -The notebook which explains how to use Couchbase with AG2 contains this code snippet. And so, the RAG Agent should return some code snippet similiar to this. - -```python -ragproxyagent = RetrieveUserProxyAgent( - name="ragproxyagent", - human_input_mode="NEVER", - max_consecutive_auto_reply=3, - retrieve_config={ - "task": "code", - "docs_path": [ - "https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Examples/Integrate%20-%20Spark.md", - "https://raw.githubusercontent.com/microsoft/FLAML/main/website/docs/Research.md", - ], - "chunk_token_size": 2000, - "model": config_list[0]["model"], - "vector_db": "couchbase", # Couchbase Capella VectorDB - "collection_name": "demo_collection", # Couchbase Capella collection name to be utilized/created - "db_config": { - "connection_string": os.environ["CB_CONN_STR"], # Couchbase Capella connection string - "username": os.environ["CB_USERNAME"], # Couchbase Capella username - "password": os.environ["CB_PASSWORD"], # Couchbase Capella password - "bucket_name": "test_db", # Couchbase Capella bucket name - "scope_name": "test_scope", # Couchbase Capella scope name - "index_name": "vector_index", # Couchbase Capella index name to be created - }, - "get_or_create": True, # set to False if you don't want to reuse an existing collection - "overwrite": False, # set to True if you want to overwrite an existing collection, each overwrite will force a index creation and reupload of documents - }, - code_execution_config=False, # set to False if you don't want to execute the code -) -``` diff --git a/tutorial/markdown/generated/vector-search-cookbook/couchbase_persistence_langgraph.md b/tutorial/markdown/generated/vector-search-cookbook/couchbase_persistence_langgraph.md deleted file mode 100644 index b1077ee..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/couchbase_persistence_langgraph.md +++ /dev/null @@ -1,316 +0,0 @@ ---- -# frontmatter -path: "/tutorial-langgraph-persistence-checkpoint" -title: Persist LangGraph State with Couchbase Checkpointer -short_title: Persist LangGraph State with Couchbase -description: - - Learn how to use Checkpointer Library for LangGraph - - Use Couchbase to store the LangGraph states -content_type: tutorial -filter: sdk -technology: - - kv -tags: - - Artificial Intelligence - - LangGraph -sdk_language: - - python -length: 20 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/langgraph/couchbase_persistence_langgraph.ipynb) - -# LangGraph Persistence with Couchbase - -### LangGraph - -LangGraph is a library for building stateful, multi-actor applications with LLMs, used to create agent and multi-agent workflows. Compared to other LLM frameworks, it offers these core benefits: cycles, controllability, and persistence. LangGraph allows you to define flows that involve cycles, essential for most agentic architectures, differentiating it from DAG-based solutions. This tutorial focuses on showcasing persisting state of [LangGraph](https://github.com/langchain-ai/langgraph) with Couchbase. - -### Checkpointer - -Checkpointers in LangGraph save snapshots of graph state at each execution step, enabling memory between interactions, human-in-the-loop workflows, and fault tolerance. By organizing states into "threads" with unique IDs - `thread_id`, they preserve conversation history and allow time travel debugging. Checkpointers implement methods to store, retrieve, and list checkpoints, with various backend options (in-memory, Couchbase, SQLite) to suit different application needs. This persistence layer is what enables agents to maintain context across multiple user interactions and recover gracefully from failures. - -### Couchbase as a Checkpointer - -This tutorial focuses on implementing a LangGraph checkpointer with Couchbase, leveraging Couchbase's distributed architecture, JSON document model, and high availability to provide robust persistence for agent workflows. Couchbase's scalability and flexible query capabilities make it an ideal backend for managing complex conversation states across multiple users and sessions. - -# How to use Couchbase Checkpointer - - - -This tutorial focuses on using a LangGraph checkpointer with Couchbase using the dedicated [langgraph-checkpointer-couchbase](https://pypi.org/project/langgraph-checkpointer-couchbase/) package. - -This package provides a seamless way to persist LangGraph agent states in Couchbase, enabling: - -- State persistence across application restarts -- Retrieval of historical conversation steps -- Continued conversations from previous checkpoints -- Both synchronous and asynchronous interfaces - -## Setup environment - -Requires Couchbase Python SDK and langgraph package - - -```python -%%capture --no-stderr -%pip install -U langgraph==0.3.22 langgraph-checkpointer-couchbase -``` - -This particular example uses OpenAI's GPT 4.1-mini as the model - - -```python -import getpass -import os - - -def _set_env(var: str): - if not os.environ.get(var): - os.environ[var] = getpass.getpass(f"{var}: ") - - -_set_env("OPENAI_API_KEY") -``` - -## Setup model and tools for the graph - -We will be creating a [ReAct Agent](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/) for this demo. Let's create a custom tool which our agent can call to get more information. - -We are using a tool `get_weather` which gives the weather information based on the city. This tool gives weather information based on the city. We are also setting up the ChatGPT model here. - - -```python -from typing import Literal -from langchain_core.tools import tool -from langchain_openai import ChatOpenAI -from langgraph.prebuilt import create_react_agent - - -@tool -def get_weather(city: Literal["nyc", "sf"]): - """Use this to get weather information.""" - if city == "nyc": - return "It might be cloudy in nyc" - elif city == "sf": - return "It's always sunny in sf" - else: - raise AssertionError("Unknown city") - - -tools = [get_weather] -model = ChatOpenAI(model_name="gpt-4.1-mini", temperature=0) -``` - -### Couchbase Connection and intialization - -There are 2 ways to initialize a saver. - -1. `from_conn_info` - Provide details of the connection string, username, password. The package will handle connection itself. -2. `from_cluster` - Provide a connected Couchbase.Cluster object. - -We will be using `from_conn_info` in the sync tutorial and `from_cluster` in the async one, but any of the above can be used as per requirements - - -## Use sync connection (CouchbaseSaver) - -Below is usage of CouchbaseSaver (for synchronous use of graph, i.e. `.invoke()`, `.stream()`). CouchbaseSaver implements four methods that are required for any checkpointer: - -- `.put` - Store a checkpoint with its configuration and metadata. -- `.put_writes` - Store intermediate writes linked to a checkpoint (i.e. pending writes). -- `.get_tuple` - Fetch a checkpoint tuple using a given configuration (`thread_id` and `checkpoint_id`). -- `.list` - List checkpoints that match a given configuration and filter criteria. - -Here we will create a Couchbase connection. We are using local setup with bucket `test`, `langgraph` scope. You may change bucket and scope if required. We will also require `checkpoints` and `checkpoint_writes` as collections inside. - -Then a [ReAct Agent](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/) is created with GPT Model, weather tool and Couchbase checkpointer. - -LangGraph's graph is invoked with message for GPT, storing all the state in Couchbase. We use get, get_tuple and list methods to fetch the states again - - -```python -from langgraph_checkpointer_couchbase import CouchbaseSaver - -with CouchbaseSaver.from_conn_info( - cb_conn_str="couchbase://localhost", - cb_username="Administrator", - cb_password="password", - bucket_name="test", - scope_name="langgraph", -) as checkpointer: - graph = create_react_agent(model, tools=tools, checkpointer=checkpointer) - config = {"configurable": {"thread_id": "1"}} - res = graph.invoke({"messages": [("human", "what's the weather in sf")]}, config) - - latest_checkpoint = checkpointer.get(config) - latest_checkpoint_tuple = checkpointer.get_tuple(config) - checkpoint_tuples = list(checkpointer.list(config)) -``` - - -```python -latest_checkpoint -``` - - - - - {'v': 2, - 'ts': '2025-04-22T04:38:11.363745+00:00', - 'id': '1f01f339-8ab3-6ce0-8003-a475eb1c8337', - 'channel_values': {'messages': [HumanMessage(content="what's the weather in sf", additional_kwargs={}, response_metadata={}, id='f8fdddb2-5a72-4bf6-84d9-1576d48d6a45'), - AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'function': {'arguments': '{"city":"sf"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 57, 'total_tokens': 72, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgXgi0qlJYegX2Mtz8KC3r9YsD8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-94b537ab-dcb7-4776-804d-087bf476761a-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'sf'}, 'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'type': 'tool_call'}], usage_metadata={'input_tokens': 57, 'output_tokens': 15, 'total_tokens': 72, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), - ToolMessage(content="It's always sunny in sf", name='get_weather', id='15c7d895-1e5a-4edf-9dfc-5a5f363d6111', tool_call_id='call_zRq0TmgKvcfoaiaRQxd1YlXe'), - AIMessage(content='The weather in San Francisco is always sunny.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 84, 'total_tokens': 95, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgYLdRzuTyLsLvAc3MDV90N8Y9F', 'finish_reason': 'stop', 'logprobs': None}, id='run-8ee450ec-4a15-4ceb-b4ed-49a98c7f9ab5-0', usage_metadata={'input_tokens': 84, 'output_tokens': 11, 'total_tokens': 95, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}, - 'channel_versions': {'__start__': 2, - 'messages': 5, - 'branch:to:agent': 5, - 'branch:to:tools': 4}, - 'versions_seen': {'__input__': {}, - '__start__': {'__start__': 1}, - 'agent': {'branch:to:agent': 4}, - 'tools': {'branch:to:tools': 3}}, - 'pending_sends': []} - - - - -```python -latest_checkpoint_tuple -``` - - - - - CheckpointTuple(config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8ab3-6ce0-8003-a475eb1c8337'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:11.363745+00:00', 'id': '1f01f339-8ab3-6ce0-8003-a475eb1c8337', 'channel_values': {'messages': [HumanMessage(content="what's the weather in sf", additional_kwargs={}, response_metadata={}, id='f8fdddb2-5a72-4bf6-84d9-1576d48d6a45'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'function': {'arguments': '{"city":"sf"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 57, 'total_tokens': 72, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgXgi0qlJYegX2Mtz8KC3r9YsD8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-94b537ab-dcb7-4776-804d-087bf476761a-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'sf'}, 'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'type': 'tool_call'}], usage_metadata={'input_tokens': 57, 'output_tokens': 15, 'total_tokens': 72, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content="It's always sunny in sf", name='get_weather', id='15c7d895-1e5a-4edf-9dfc-5a5f363d6111', tool_call_id='call_zRq0TmgKvcfoaiaRQxd1YlXe'), AIMessage(content='The weather in San Francisco is always sunny.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 84, 'total_tokens': 95, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgYLdRzuTyLsLvAc3MDV90N8Y9F', 'finish_reason': 'stop', 'logprobs': None}, id='run-8ee450ec-4a15-4ceb-b4ed-49a98c7f9ab5-0', usage_metadata={'input_tokens': 84, 'output_tokens': 11, 'total_tokens': 95, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}, 'channel_versions': {'__start__': 2, 'messages': 5, 'branch:to:agent': 5, 'branch:to:tools': 4}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 4}, 'tools': {'branch:to:tools': 3}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'agent': {'messages': [AIMessage(content='The weather in San Francisco is always sunny.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 84, 'total_tokens': 95, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgYLdRzuTyLsLvAc3MDV90N8Y9F', 'finish_reason': 'stop', 'logprobs': None}, id='run-8ee450ec-4a15-4ceb-b4ed-49a98c7f9ab5-0', usage_metadata={'input_tokens': 84, 'output_tokens': 11, 'total_tokens': 95, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}, 'step': 3, 'parents': {}, 'thread_id': '1'}, parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8237-6166-8002-ac95e49f67e4'}}, pending_writes=[]) - - - - -```python -checkpoint_tuples -``` - - - - - [CheckpointTuple(config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8ab3-6ce0-8003-a475eb1c8337'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:11.363745+00:00', 'id': '1f01f339-8ab3-6ce0-8003-a475eb1c8337', 'channel_values': {'messages': [HumanMessage(content="what's the weather in sf", additional_kwargs={}, response_metadata={}, id='f8fdddb2-5a72-4bf6-84d9-1576d48d6a45'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'function': {'arguments': '{"city":"sf"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 57, 'total_tokens': 72, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgXgi0qlJYegX2Mtz8KC3r9YsD8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-94b537ab-dcb7-4776-804d-087bf476761a-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'sf'}, 'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'type': 'tool_call'}], usage_metadata={'input_tokens': 57, 'output_tokens': 15, 'total_tokens': 72, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content="It's always sunny in sf", name='get_weather', id='15c7d895-1e5a-4edf-9dfc-5a5f363d6111', tool_call_id='call_zRq0TmgKvcfoaiaRQxd1YlXe'), AIMessage(content='The weather in San Francisco is always sunny.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 84, 'total_tokens': 95, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgYLdRzuTyLsLvAc3MDV90N8Y9F', 'finish_reason': 'stop', 'logprobs': None}, id='run-8ee450ec-4a15-4ceb-b4ed-49a98c7f9ab5-0', usage_metadata={'input_tokens': 84, 'output_tokens': 11, 'total_tokens': 95, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}, 'channel_versions': {'__start__': 2, 'messages': 5, 'branch:to:agent': 5, 'branch:to:tools': 4}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 4}, 'tools': {'branch:to:tools': 3}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'agent': {'messages': [AIMessage(content='The weather in San Francisco is always sunny.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 11, 'prompt_tokens': 84, 'total_tokens': 95, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgYLdRzuTyLsLvAc3MDV90N8Y9F', 'finish_reason': 'stop', 'logprobs': None}, id='run-8ee450ec-4a15-4ceb-b4ed-49a98c7f9ab5-0', usage_metadata={'input_tokens': 84, 'output_tokens': 11, 'total_tokens': 95, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}, 'step': 3, 'parents': {}, 'thread_id': '1'}, parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8237-6166-8002-ac95e49f67e4'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8237-6166-8002-ac95e49f67e4'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:10.473797+00:00', 'id': '1f01f339-8237-6166-8002-ac95e49f67e4', 'channel_values': {'messages': [HumanMessage(content="what's the weather in sf", additional_kwargs={}, response_metadata={}, id='f8fdddb2-5a72-4bf6-84d9-1576d48d6a45'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'function': {'arguments': '{"city":"sf"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 57, 'total_tokens': 72, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgXgi0qlJYegX2Mtz8KC3r9YsD8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-94b537ab-dcb7-4776-804d-087bf476761a-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'sf'}, 'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'type': 'tool_call'}], usage_metadata={'input_tokens': 57, 'output_tokens': 15, 'total_tokens': 72, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content="It's always sunny in sf", name='get_weather', id='15c7d895-1e5a-4edf-9dfc-5a5f363d6111', tool_call_id='call_zRq0TmgKvcfoaiaRQxd1YlXe')], 'branch:to:agent': None}, 'channel_versions': {'__start__': 2, 'messages': 4, 'branch:to:agent': 4, 'branch:to:tools': 4}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 2}, 'tools': {'branch:to:tools': 3}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'tools': {'messages': [ToolMessage(content="It's always sunny in sf", name='get_weather', id='15c7d895-1e5a-4edf-9dfc-5a5f363d6111', tool_call_id='call_zRq0TmgKvcfoaiaRQxd1YlXe')]}}, 'step': 2, 'parents': {}, 'thread_id': '1'}, parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8230-6028-8001-76f23dc92abc'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-8230-6028-8001-76f23dc92abc'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:10.470894+00:00', 'id': '1f01f339-8230-6028-8001-76f23dc92abc', 'channel_values': {'messages': [HumanMessage(content="what's the weather in sf", additional_kwargs={}, response_metadata={}, id='f8fdddb2-5a72-4bf6-84d9-1576d48d6a45'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'function': {'arguments': '{"city":"sf"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 57, 'total_tokens': 72, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgXgi0qlJYegX2Mtz8KC3r9YsD8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-94b537ab-dcb7-4776-804d-087bf476761a-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'sf'}, 'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'type': 'tool_call'}], usage_metadata={'input_tokens': 57, 'output_tokens': 15, 'total_tokens': 72, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})], 'branch:to:tools': None}, 'channel_versions': {'__start__': 2, 'messages': 3, 'branch:to:agent': 3, 'branch:to:tools': 3}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 2}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'function': {'arguments': '{"city":"sf"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 15, 'prompt_tokens': 57, 'total_tokens': 72, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzgXgi0qlJYegX2Mtz8KC3r9YsD8', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-94b537ab-dcb7-4776-804d-087bf476761a-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'sf'}, 'id': 'call_zRq0TmgKvcfoaiaRQxd1YlXe', 'type': 'tool_call'}], usage_metadata={'input_tokens': 57, 'output_tokens': 15, 'total_tokens': 72, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}, 'step': 1, 'parents': {}, 'thread_id': '1'}, parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-6f17-6fe0-8000-bd10177d2f78'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-6f17-6fe0-8000-bd10177d2f78'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:08.468774+00:00', 'id': '1f01f339-6f17-6fe0-8000-bd10177d2f78', 'channel_values': {'messages': [HumanMessage(content="what's the weather in sf", additional_kwargs={}, response_metadata={}, id='f8fdddb2-5a72-4bf6-84d9-1576d48d6a45')], 'branch:to:agent': None}, 'channel_versions': {'__start__': 2, 'messages': 2, 'branch:to:agent': 2}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': None, 'step': 0, 'parents': {}, 'thread_id': '1'}, parent_config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-6f15-6664-bfff-0d88597e0554'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '1', 'checkpoint_ns': '', 'checkpoint_id': '1f01f339-6f15-6664-bfff-0d88597e0554'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:08.467717+00:00', 'id': '1f01f339-6f15-6664-bfff-0d88597e0554', 'channel_values': {'__start__': {'messages': [['human', "what's the weather in sf"]]}}, 'channel_versions': {'__start__': 1}, 'versions_seen': {'__input__': {}}, 'pending_sends': []}, metadata={'source': 'input', 'writes': {'__start__': {'messages': [['human', "what's the weather in sf"]]}}, 'step': -1, 'parents': {}, 'thread_id': '1'}, parent_config=None, pending_writes=None)] - - - -## Use async connection (AsyncCouchbaseSaver) - -This is the asynchronous example, Here we will create a Couchbase connection. We are using local setup with bucket `test`, `langgraph` scope. We will also require `checkpoints` and `checkpoint_writes` as collections inside. These are the methods supported by the library - -- `.aput` - Store a checkpoint with its configuration and metadata. -- `.aput_writes` - Store intermediate writes linked to a checkpoint (i.e. pending writes). -- `.aget_tuple` - Fetch a checkpoint tuple using a given configuration (`thread_id` and `checkpoint_id`). -- `.alist` - List checkpoints that match a given configuration and filter criteria. - -Then a [ReAct Agent](https://langchain-ai.github.io/langgraph/how-tos/create-react-agent/) is created with GPT Model, weather tool and Couchbase checkpointer. - -LangGraph's graph is invoked with message for GPT, storing all the state in Couchbase. We use aget, aget_tuple and alist methods to fetch the states again - - -```python -# Create Couchbase Cluster Connection -from acouchbase.cluster import Cluster as ACluster -from couchbase.auth import PasswordAuthenticator -from couchbase.options import ClusterOptions - -cb_conn_str = "couchbase://localhost" -cb_username = "Administrator" -cb_password = "password" - -auth = PasswordAuthenticator(cb_username, cb_password) -options = ClusterOptions(auth) -cb_cluster = await ACluster.connect(cb_conn_str, options) -``` - - -```python -from langgraph_checkpointer_couchbase import AsyncCouchbaseSaver - -async with AsyncCouchbaseSaver.from_cluster( - cluster=cb_cluster, - bucket_name="test", - scope_name="langgraph", -) as checkpointer: - graph = create_react_agent(model, tools=tools, checkpointer=checkpointer) - config = {"configurable": {"thread_id": "2"}} - res = await graph.ainvoke( - {"messages": [("human", "what's the weather in nyc")]}, config - ) - - latest_checkpoint = await checkpointer.aget(config) - latest_checkpoint_tuple = await checkpointer.aget_tuple(config) - checkpoint_tuples = [c async for c in checkpointer.alist(config)] -``` - - -```python -latest_checkpoint -``` - - - - - {'v': 2, - 'ts': '2025-04-22T04:38:51.638880+00:00', - 'id': '1f01f33b-0acb-6cce-8003-7a4f1a2cf90e', - 'channel_values': {'messages': [HumanMessage(content="what's the weather in nyc", additional_kwargs={}, response_metadata={}, id='9475ef5c-67f8-4fa7-b3f4-606b2d74391d'), - AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'function': {'arguments': '{"city":"nyc"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 58, 'total_tokens': 74, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhByznAjlt4j32zfgqTRXfmqNIN', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d299d9a0-000f-4a4d-a135-373289ffc9b9-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'nyc'}, 'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'type': 'tool_call'}], usage_metadata={'input_tokens': 58, 'output_tokens': 16, 'total_tokens': 74, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), - ToolMessage(content='It might be cloudy in nyc', name='get_weather', id='54b58baa-d179-496f-a84c-f55a340874cc', tool_call_id='call_haxMoLeSz5hkuUXcerR45Kqt'), - AIMessage(content="The weather in NYC might be cloudy. Is there anything else you'd like to know?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 88, 'total_tokens': 107, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhCjd46jhDUvB2B01WHfyikY6he', 'finish_reason': 'stop', 'logprobs': None}, id='run-15e27c46-8433-443e-9909-7eea9a1ab2dc-0', usage_metadata={'input_tokens': 88, 'output_tokens': 19, 'total_tokens': 107, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}, - 'channel_versions': {'__start__': 2, - 'messages': 5, - 'branch:to:agent': 5, - 'branch:to:tools': 4}, - 'versions_seen': {'__input__': {}, - '__start__': {'__start__': 1}, - 'agent': {'branch:to:agent': 4}, - 'tools': {'branch:to:tools': 3}}, - 'pending_sends': []} - - - - -```python -latest_checkpoint_tuple -``` - - - - - CheckpointTuple(config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0acb-6cce-8003-7a4f1a2cf90e'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:51.638880+00:00', 'id': '1f01f33b-0acb-6cce-8003-7a4f1a2cf90e', 'channel_values': {'messages': [HumanMessage(content="what's the weather in nyc", additional_kwargs={}, response_metadata={}, id='9475ef5c-67f8-4fa7-b3f4-606b2d74391d'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'function': {'arguments': '{"city":"nyc"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 58, 'total_tokens': 74, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhByznAjlt4j32zfgqTRXfmqNIN', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d299d9a0-000f-4a4d-a135-373289ffc9b9-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'nyc'}, 'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'type': 'tool_call'}], usage_metadata={'input_tokens': 58, 'output_tokens': 16, 'total_tokens': 74, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content='It might be cloudy in nyc', name='get_weather', id='54b58baa-d179-496f-a84c-f55a340874cc', tool_call_id='call_haxMoLeSz5hkuUXcerR45Kqt'), AIMessage(content="The weather in NYC might be cloudy. Is there anything else you'd like to know?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 88, 'total_tokens': 107, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhCjd46jhDUvB2B01WHfyikY6he', 'finish_reason': 'stop', 'logprobs': None}, id='run-15e27c46-8433-443e-9909-7eea9a1ab2dc-0', usage_metadata={'input_tokens': 88, 'output_tokens': 19, 'total_tokens': 107, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}, 'channel_versions': {'__start__': 2, 'messages': 5, 'branch:to:agent': 5, 'branch:to:tools': 4}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 4}, 'tools': {'branch:to:tools': 3}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'agent': {'messages': [AIMessage(content="The weather in NYC might be cloudy. Is there anything else you'd like to know?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 88, 'total_tokens': 107, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhCjd46jhDUvB2B01WHfyikY6he', 'finish_reason': 'stop', 'logprobs': None}, id='run-15e27c46-8433-443e-9909-7eea9a1ab2dc-0', usage_metadata={'input_tokens': 88, 'output_tokens': 19, 'total_tokens': 107, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}, 'step': 3, 'parents': {}, 'thread_id': '2'}, parent_config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0138-6a22-8002-a9a8ec4b7e4e'}}, pending_writes=[]) - - - - -```python -checkpoint_tuples -``` - - - - - [CheckpointTuple(config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0acb-6cce-8003-7a4f1a2cf90e'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:51.638880+00:00', 'id': '1f01f33b-0acb-6cce-8003-7a4f1a2cf90e', 'channel_values': {'messages': [HumanMessage(content="what's the weather in nyc", additional_kwargs={}, response_metadata={}, id='9475ef5c-67f8-4fa7-b3f4-606b2d74391d'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'function': {'arguments': '{"city":"nyc"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 58, 'total_tokens': 74, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhByznAjlt4j32zfgqTRXfmqNIN', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d299d9a0-000f-4a4d-a135-373289ffc9b9-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'nyc'}, 'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'type': 'tool_call'}], usage_metadata={'input_tokens': 58, 'output_tokens': 16, 'total_tokens': 74, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content='It might be cloudy in nyc', name='get_weather', id='54b58baa-d179-496f-a84c-f55a340874cc', tool_call_id='call_haxMoLeSz5hkuUXcerR45Kqt'), AIMessage(content="The weather in NYC might be cloudy. Is there anything else you'd like to know?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 88, 'total_tokens': 107, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhCjd46jhDUvB2B01WHfyikY6he', 'finish_reason': 'stop', 'logprobs': None}, id='run-15e27c46-8433-443e-9909-7eea9a1ab2dc-0', usage_metadata={'input_tokens': 88, 'output_tokens': 19, 'total_tokens': 107, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}, 'channel_versions': {'__start__': 2, 'messages': 5, 'branch:to:agent': 5, 'branch:to:tools': 4}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 4}, 'tools': {'branch:to:tools': 3}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'agent': {'messages': [AIMessage(content="The weather in NYC might be cloudy. Is there anything else you'd like to know?", additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 19, 'prompt_tokens': 88, 'total_tokens': 107, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhCjd46jhDUvB2B01WHfyikY6he', 'finish_reason': 'stop', 'logprobs': None}, id='run-15e27c46-8433-443e-9909-7eea9a1ab2dc-0', usage_metadata={'input_tokens': 88, 'output_tokens': 19, 'total_tokens': 107, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}, 'step': 3, 'parents': {}, 'thread_id': '2'}, parent_config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0138-6a22-8002-a9a8ec4b7e4e'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0138-6a22-8002-a9a8ec4b7e4e'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:50.634902+00:00', 'id': '1f01f33b-0138-6a22-8002-a9a8ec4b7e4e', 'channel_values': {'messages': [HumanMessage(content="what's the weather in nyc", additional_kwargs={}, response_metadata={}, id='9475ef5c-67f8-4fa7-b3f4-606b2d74391d'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'function': {'arguments': '{"city":"nyc"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 58, 'total_tokens': 74, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhByznAjlt4j32zfgqTRXfmqNIN', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d299d9a0-000f-4a4d-a135-373289ffc9b9-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'nyc'}, 'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'type': 'tool_call'}], usage_metadata={'input_tokens': 58, 'output_tokens': 16, 'total_tokens': 74, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}), ToolMessage(content='It might be cloudy in nyc', name='get_weather', id='54b58baa-d179-496f-a84c-f55a340874cc', tool_call_id='call_haxMoLeSz5hkuUXcerR45Kqt')], 'branch:to:agent': None}, 'channel_versions': {'__start__': 2, 'messages': 4, 'branch:to:agent': 4, 'branch:to:tools': 4}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 2}, 'tools': {'branch:to:tools': 3}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'tools': {'messages': [ToolMessage(content='It might be cloudy in nyc', name='get_weather', id='54b58baa-d179-496f-a84c-f55a340874cc', tool_call_id='call_haxMoLeSz5hkuUXcerR45Kqt')]}}, 'step': 2, 'parents': {}, 'thread_id': '2'}, parent_config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0134-6404-8001-5d2722d53ba0'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33b-0134-6404-8001-5d2722d53ba0'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:50.633105+00:00', 'id': '1f01f33b-0134-6404-8001-5d2722d53ba0', 'channel_values': {'messages': [HumanMessage(content="what's the weather in nyc", additional_kwargs={}, response_metadata={}, id='9475ef5c-67f8-4fa7-b3f4-606b2d74391d'), AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'function': {'arguments': '{"city":"nyc"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 58, 'total_tokens': 74, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhByznAjlt4j32zfgqTRXfmqNIN', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d299d9a0-000f-4a4d-a135-373289ffc9b9-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'nyc'}, 'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'type': 'tool_call'}], usage_metadata={'input_tokens': 58, 'output_tokens': 16, 'total_tokens': 74, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})], 'branch:to:tools': None}, 'channel_versions': {'__start__': 2, 'messages': 3, 'branch:to:agent': 3, 'branch:to:tools': 3}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}, 'agent': {'branch:to:agent': 2}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': {'agent': {'messages': [AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'function': {'arguments': '{"city":"nyc"}', 'name': 'get_weather'}, 'type': 'function'}], 'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 16, 'prompt_tokens': 58, 'total_tokens': 74, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4.1-mini-2025-04-14', 'system_fingerprint': 'fp_79b79be41f', 'id': 'chatcmpl-BOzhByznAjlt4j32zfgqTRXfmqNIN', 'finish_reason': 'tool_calls', 'logprobs': None}, id='run-d299d9a0-000f-4a4d-a135-373289ffc9b9-0', tool_calls=[{'name': 'get_weather', 'args': {'city': 'nyc'}, 'id': 'call_haxMoLeSz5hkuUXcerR45Kqt', 'type': 'tool_call'}], usage_metadata={'input_tokens': 58, 'output_tokens': 16, 'total_tokens': 74, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})]}}, 'step': 1, 'parents': {}, 'thread_id': '2'}, parent_config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33a-f6f8-65bc-8000-46d63ff6f934'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33a-f6f8-65bc-8000-46d63ff6f934'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:49.559999+00:00', 'id': '1f01f33a-f6f8-65bc-8000-46d63ff6f934', 'channel_values': {'messages': [HumanMessage(content="what's the weather in nyc", additional_kwargs={}, response_metadata={}, id='9475ef5c-67f8-4fa7-b3f4-606b2d74391d')], 'branch:to:agent': None}, 'channel_versions': {'__start__': 2, 'messages': 2, 'branch:to:agent': 2}, 'versions_seen': {'__input__': {}, '__start__': {'__start__': 1}}, 'pending_sends': []}, metadata={'source': 'loop', 'writes': None, 'step': 0, 'parents': {}, 'thread_id': '2'}, parent_config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33a-f6f5-6790-bfff-47d3da71ee7e'}}, pending_writes=None), - CheckpointTuple(config={'configurable': {'thread_id': '2', 'checkpoint_ns': '', 'checkpoint_id': '1f01f33a-f6f5-6790-bfff-47d3da71ee7e'}}, checkpoint={'v': 2, 'ts': '2025-04-22T04:38:49.558819+00:00', 'id': '1f01f33a-f6f5-6790-bfff-47d3da71ee7e', 'channel_values': {'__start__': {'messages': [['human', "what's the weather in nyc"]]}}, 'channel_versions': {'__start__': 1}, 'versions_seen': {'__input__': {}}, 'pending_sends': []}, metadata={'source': 'input', 'writes': {'__start__': {'messages': [['human', "what's the weather in nyc"]]}}, 'step': -1, 'parents': {}, 'thread_id': '2'}, parent_config=None, pending_writes=None)] - - - - -```python - -``` diff --git a/tutorial/markdown/generated/vector-search-cookbook/hugging_face.md b/tutorial/markdown/generated/vector-search-cookbook/hugging_face.md deleted file mode 100644 index 89e1064..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/hugging_face.md +++ /dev/null @@ -1,655 +0,0 @@ ---- -# frontmatter -path: "/tutorial-huggingface-couchbase-vector-search-with-global-secondary-index" -title: Using Hugging Face Embeddings with Couchbase Vector Search with GSI -short_title: Hugging Face with Couchbase Vector Search with GSI -description: - - Learn how to generate embeddings using Hugging Face and store them in Couchbase. - - This tutorial demonstrates how to use Couchbase's vector search capabilities with Hugging Face embeddings. - - You'll understand how to perform vector search to find relevant documents based on similarity with GSI. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - Hugging Face -sdk_language: - - python -length: 30 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/huggingface/gsi/hugging_face.ipynb) - -# Semantic Search with Couchbase GSI Vector Search and Hugging Face - -## Overview - -In this guide, we will walk you through building a powerful semantic search engine using Couchbase as the backend database and [Hugging Face](https://huggingface.co/) as the AI-powered embedding model provider. Semantic search goes beyond simple keyword matching by understanding the context and meaning behind the words in a query, making it an essential tool for applications that require intelligent information retrieval. - -This tutorial demonstrates how to leverage Couchbase's **Global Secondary Index (GSI) vector search capabilities** with Hugging Face embeddings to create a high-performance semantic search system. GSI vector search in Couchbase offers significant advantages over traditional FTS (Full-Text Search) approaches, particularly for vector-first workloads and scenarios requiring complex filtering with high query-per-second (QPS) performance. - -This guide is designed to be comprehensive yet accessible, with clear step-by-step instructions that will equip you with the knowledge to create a fully functional semantic search system. Whether you're building a recommendation engine, content discovery platform, or any application requiring intelligent document retrieval, this tutorial provides the foundation you need. - -**Note**: If you want to perform semantic search using the FTS (Full-Text Search) index instead, please take a look at [this alternative approach](https://developer.couchbase.com//tutorial-huggingface-couchbase-vector-search-with-fts). - -## How to Run This Tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/huggingface/gsi/hugging_face.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -## Setup and Installation - -### Install Necessary Libraries - - -```python -!pip install --quiet langchain-couchbase==0.5.0 transformers==4.56.1 sentence_transformers==5.1.0 langchain_huggingface==0.3.1 python-dotenv==1.1.1 ipywidgets -``` - -### Import Required Modules - - -```python -from pathlib import Path -from datetime import timedelta -from transformers import pipeline, AutoModel, AutoTokenizer -from langchain_huggingface.embeddings.huggingface import HuggingFaceEmbeddings -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.options import ClusterOptions -from langchain_core.globals import set_llm_cache -from langchain_couchbase.cache import CouchbaseCache -from langchain_couchbase.vectorstores import CouchbaseQueryVectorStore -from langchain_couchbase.vectorstores import DistanceStrategy -from langchain_couchbase.vectorstores import IndexType -import getpass -import os -from dotenv import load_dotenv -``` - -### Prerequisites - -To run this tutorial successfully, you will need the following requirements: - -#### Couchbase Requirements - -**Version Requirements:** -- **Couchbase Server 8.0+** or **Couchbase Capella** with Query Service enabled -- Note: GSI vector search is a newer feature that requires Couchbase Server 8.0 or above, unlike FTS-based vector search which works with 7.6+ - -**Access Requirements:** -- A configured Bucket, Scope, and Collection -- User credentials with **Read and Write** access to your target collection -- Network connectivity to your Couchbase cluster - -#### Create and Deploy Your Free Tier Operational Cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with an environment where you can explore and learn about Capella with no time constraint. - -To learn more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -#### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met: - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the required bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -#### Python Environment Requirements - -- **Python 3.8+** -- Required Python packages (installed via pip in the next section): - - `langchain-couchbase==0.5.0rc1` - - `transformers==4.56.1` - - `sentence_transformers==5.1.0` - - `langchain_huggingface==0.3.1` - - -```python -# Load environment variables -load_dotenv("./.env") - -# Configuration -couchbase_cluster_url = os.getenv('CB_CLUSTER_URL') or input("Couchbase Cluster URL:") -couchbase_username = os.getenv('CB_USERNAME') or input("Couchbase Username:") -couchbase_password = os.getenv('CB_PASSWORD') or getpass.getpass("Couchbase password:") -couchbase_bucket = os.getenv('CB_BUCKET') or input("Couchbase Bucket:") -couchbase_scope = os.getenv('CB_SCOPE') or input("Couchbase Scope:") -couchbase_collection = os.getenv('CB_COLLECTION') or input("Couchbase Collection:") -``` - -## Couchbase Connection Setup - -### Create Authentication Object - -In this section, we first need to create a `PasswordAuthenticator` object that would hold our Couchbase credentials: - - -```python -auth = PasswordAuthenticator( - couchbase_username, - couchbase_password -) -``` - -### Connect to Cluster - -Then, we use this object to connect to Couchbase Cluster and select specified above bucket, scope and collection: - - -```python -print("Connecting to cluster at URL: " + couchbase_cluster_url) -cluster = Cluster(couchbase_cluster_url, ClusterOptions(auth)) -cluster.wait_until_ready(timedelta(seconds=5)) - -bucket = cluster.bucket(couchbase_bucket) -scope = bucket.scope(couchbase_scope) -collection = scope.collection(couchbase_collection) -print("Connected to the cluster") -``` - - Connecting to cluster at URL: couchbase://localhost - Connected to the cluster - - -## Understanding GSI Vector Search - -### Optimizing Vector Search with Global Secondary Index (GSI) - -With Couchbase 8.0+, you can leverage the power of GSI-based vector search, which offers significant performance improvements over traditional Full-Text Search (FTS) approaches for vector-first workloads. GSI vector search provides high-performance vector similarity search with advanced filtering capabilities and is designed to scale to billions of vectors. - -#### GSI vs FTS: Choosing the Right Approach - -| Feature | GSI Vector Search | FTS Vector Search | -| --------------------- | --------------------------------------------------------------- | ----------------------------------------- | -| **Best For** | Vector-first workloads, complex filtering, high QPS performance| Hybrid search and high recall rates | -| **Couchbase Version** | 8.0.0+ | 7.6+ | -| **Filtering** | Pre-filtering with `WHERE` clauses (Composite) or post-filtering (BHIVE) | Pre-filtering with flexible ordering | -| **Scalability** | Up to billions of vectors (BHIVE) | Up to 10 million vectors | -| **Performance** | Optimized for concurrent operations with low memory footprint | Good for mixed text and vector queries | - -#### GSI Vector Index Types - -Couchbase offers two distinct GSI vector index types, each optimized for different use cases: - -##### Hyperscale Vector Indexes (BHIVE) - -- **Best for**: Pure vector searches like content discovery, recommendations, and semantic search -- **Use when**: You primarily perform vector-only queries without complex scalar filtering -- **Features**: - - High performance with low memory footprint - - Optimized for concurrent operations - - Designed to scale to billions of vectors - - Supports post-scan filtering for basic metadata filtering - -##### Composite Vector Indexes - -- **Best for**: Filtered vector searches that combine vector similarity with scalar value filtering -- **Use when**: Your queries combine vector similarity with scalar filters that eliminate large portions of data -- **Features**: - - Efficient pre-filtering where scalar attributes reduce the vector comparison scope - - Best for well-defined workloads requiring complex filtering using GSI features - - Supports range lookups combined with vector search - -#### Index Type Selection for This Tutorial - -In this tutorial, we'll demonstrate creating a **BHIVE index** and running vector similarity queries using GSI. BHIVE is ideal for semantic search scenarios where you want: - -1. **High-performance vector search** across large datasets -2. **Low latency** for real-time applications -3. **Scalability** to handle growing vector collections -4. **Concurrent operations** for multi-user environments - -The BHIVE index will provide optimal performance for our Hugging Face embedding-based semantic search implementation. - -#### Alternative: Composite Vector Index - -If your use case requires complex filtering with scalar attributes, you may want to consider using a **Composite Vector Index** instead: - -```python -# Alternative: Create a Composite index for filtered searches -vector_store.create_index( - index_type=IndexType.COMPOSITE, - index_description="IVF,SQ8", - distance_metric=DistanceStrategy.COSINE, - index_name="huggingface_composite_index", -) -``` - -**Use Composite indexes when:** -- You need to filter by document metadata or attributes before vector similarity -- Your queries combine vector search with WHERE clauses -- You have well-defined filtering requirements that can reduce the search space - -**Note**: Composite indexes enable pre-filtering with scalar attributes, making them ideal for applications where you need to search within specific categories, date ranges, or user-specific data segments. - -#### Understanding GSI Index Configuration (Couchbase 8.0 Feature) - -Before creating our BHIVE index, it's important to understand the configuration parameters that optimize vector storage and search performance. The `index_description` parameter controls how Couchbase optimizes vector storage through centroids and quantization. - -##### Index Description Format: `'IVF[],{PQ|SQ}'` - -###### Centroids (IVF - Inverted File) - -- Controls how the dataset is subdivided for faster searches -- **More centroids** = faster search, slower training time -- **Fewer centroids** = slower search, faster training time -- If omitted (like `IVF,SQ8`), Couchbase auto-selects based on dataset size - -###### Quantization Options - -**Scalar Quantization (SQ):** -- `SQ4`, `SQ6`, `SQ8` (4, 6, or 8 bits per dimension) -- Lower memory usage, faster search, slightly reduced accuracy - -**Product Quantization (PQ):** -- Format: `PQx` (e.g., `PQ32x8`) -- Better compression for very large datasets -- More complex but can maintain accuracy with smaller index size - -###### Common Configuration Examples - -- **`IVF,SQ8`** - Auto centroids, 8-bit scalar quantization (good default) -- **`IVF1000,SQ6`** - 1000 centroids, 6-bit scalar quantization -- **`IVF,PQ32x8`** - Auto centroids, 32 subquantizers with 8 bits - -For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/cloud/vector-index/hyperscale-vector-index.html#algo_settings). - -For more information on GSI vector indexes, see [Couchbase GSI Vector Documentation](https://docs.couchbase.com/cloud/vector-index/use-vector-indexes.html). - -##### Our Configuration Choice - -In this tutorial, we use `IVF,SQ8` which provides: -- **Auto-selected centroids** optimized for our dataset size -- **8-bit scalar quantization** for good balance of speed, memory usage, and accuracy -- **COSINE distance metric** ideal for semantic similarity search -- **Optimal performance** for most semantic search use cases - - -```python -# Create a BHIVE GSI vector index (good default: IVF,SQ8) -vector_store = CouchbaseQueryVectorStore( - cluster=cluster, - bucket_name=couchbase_bucket, - scope_name=couchbase_scope, - collection_name=couchbase_collection, - embedding=HuggingFaceEmbeddings(), # Hugging Face Initialization - distance_metric=DistanceStrategy.COSINE -) -``` - -## Document Processing and Embedding - -### Embedding Documents - -Now that we have set up our vector store with Hugging Face embeddings, we can add documents to our collection. The `CouchbaseQueryVectorStore` automatically handles the embedding generation process using the Hugging Face transformers library. - -#### Understanding the Embedding Process - -When we add text documents to our vector store, several important processes happen automatically: - -1. **Text Preprocessing**: The input text is preprocessed and tokenized according to the Hugging Face model's requirements -2. **Vector Generation**: Each document is converted into a high-dimensional vector (embedding) that captures its semantic meaning -3. **Storage**: The embeddings are stored in Couchbase along with the original text and any metadata -4. **Indexing**: The vectors are indexed using our BHIVE GSI index for efficient similarity search - -#### Adding Sample Documents - -In this example, we're adding sample documents that demonstrate Couchbase's capabilities. The system will: -- Generate embeddings for each text document using the Hugging Face model -- Store them in our Couchbase collection -- Make them immediately available for semantic search once the GSI index is ready - -**Note**: The `batch_size` parameter controls how many documents are processed together, which can help optimize performance for large document sets. - - -```python -texts = [ - "Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.", - "It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.", - input("Enter custom embedding text:") -] -vector_store.add_texts(texts=texts, batch_size=32) -``` - - - - - ['7c601881e4bf4c53b5b4c2a25628d904', - '0442f351aec2415481138315d492ee80', - 'e20a8dcd8b464e8e819b87c9a0ff05c3'] - - - -## Vector Search Performance Optimization - -Now let's demonstrate the performance benefits of different optimization approaches available in Couchbase. We'll compare three optimization levels to show how each contributes to building a production-ready semantic search system: - -1. **Baseline (Raw Search)**: Basic vector similarity search without GSI optimization -2. **GSI-Optimized Search**: High-performance search using BHIVE GSI index -3. **Cache Benefits**: Show how caching can be applied on top of any search approach - -**Important**: Caching is orthogonal to index types - you can apply caching benefits to both raw searches and GSI-optimized searches to improve repeated query performance. - -### Understanding Vector Search Results - -Before we start our RAG comparisons, let's understand what the search results mean: - -When you perform a search query with vector search: - -1. **Query Embedding**: Your search text is converted into a vector embedding using the Hugging Face model -2. **Vector Similarity Calculation**: The system compares your query vector against all stored document vectors -3. **Distance Computation**: Using the COSINE distance metric, the system calculates similarity distances -4. **Result Ranking**: Documents are ranked by their distance values (lower = more similar) -5. **Post-processing**: Results include both the document content and metadata - -**Note**: The returned value represents the vector distance between query and document embeddings. Lower distance values indicate higher similarity. - -### RAG Search Function - -Let's create a comprehensive search function for our RAG performance comparison: - - -```python -import time - -def search_with_performance_metrics(query_text, stage_name, k=3): - """Perform optimized semantic search with detailed performance metrics""" - print(f"\n=== {stage_name.upper()} ===") - print(f"Query: \"{query_text}\"") - - start_time = time.time() - results = vector_store.similarity_search_with_score(query_text, k=k) - end_time = time.time() - - search_time = end_time - start_time - print(f"Search Time: {search_time:.4f} seconds") - print(f"Results Found: {len(results)} documents") - - for i, (doc, distance) in enumerate(results, 1): - print(f"\n[Result {i}]") - print(f"Vector Distance: {distance:.6f} (lower = more similar)") - # Use the document content directly from search results (no additional KV call needed) - print(f"Document Content: {doc.page_content}") - if hasattr(doc, 'metadata') and doc.metadata: - print(f"Metadata: {doc.metadata}") - - return search_time, results -``` - -### Phase 1: Baseline Performance (Raw Vector Search) - -First, let's establish baseline performance with raw vector search - no GSI optimization yet: - - -```python -test_query = "What are the key features of a scalable NoSQL database?" -print("Testing baseline performance without GSI optimization...") -baseline_time, baseline_results = search_with_performance_metrics( - test_query, "Phase 1: Baseline Vector Search" -) -``` - - Testing baseline performance without GSI optimization... - - === PHASE 1: BASELINE VECTOR SEARCH === - Query: "What are the key features of a scalable NoSQL database?" - Search Time: 0.1484 seconds - Results Found: 3 documents - - [Result 1] - Vector Distance: 0.586197 (lower = more similar) - Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. - - [Result 2] - Vector Distance: 0.645435 (lower = more similar) - Document Content: It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more. - - [Result 3] - Vector Distance: 0.976888 (lower = more similar) - Document Content: this is a sample text with the data "hello" - - -### Phase 2: Create BHIVE GSI Index and Test Performance - -Now let's create the BHIVE GSI index and measure the performance improvement: - - -```python -# Create BHIVE index for optimized vector search -print("Creating BHIVE GSI vector index...") -try: - vector_store.create_index( - index_type=IndexType.BHIVE, - index_description="IVF,SQ8", - distance_metric=DistanceStrategy.COSINE, - index_name="huggingface_bhive_index", - ) - print("✓ BHIVE GSI vector index created successfully!") - - # Wait for index to become available - print("Waiting for index to become available...") - time.sleep(3) - -except Exception as e: - if "already exists" in str(e).lower(): - print("✓ BHIVE GSI vector index already exists, proceeding...") - else: - print(f"Error creating GSI index: {str(e)}") - -# Test the same query with GSI optimization -print("\nTesting performance with BHIVE GSI optimization...") -gsi_time, gsi_results = search_with_performance_metrics( - test_query, "Phase 2: GSI-Optimized Search" -) -``` - - Creating BHIVE GSI vector index... - ✓ BHIVE GSI vector index created successfully! - Waiting for index to become available... - - Testing performance with BHIVE GSI optimization... - - === PHASE 2: GSI-OPTIMIZED SEARCH === - Query: "What are the key features of a scalable NoSQL database?" - Search Time: 0.0848 seconds - Results Found: 3 documents - - [Result 1] - Vector Distance: 0.586197 (lower = more similar) - Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. - - [Result 2] - Vector Distance: 0.645435 (lower = more similar) - Document Content: It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more. - - [Result 3] - Vector Distance: 0.976888 (lower = more similar) - Document Content: this is a sample text with the data "hello" - - -### Phase 3: Demonstrate Cache Benefits - -Now let's show how caching can improve performance for repeated queries. **Note**: Caching benefits apply to both raw searches and GSI-optimized searches. - - -```python -# Set up Couchbase cache (can be applied to any search approach) -print("Setting up Couchbase cache for improved performance on repeated queries...") -cache = CouchbaseCache( - cluster=cluster, - bucket_name=couchbase_bucket, - scope_name=couchbase_scope, - collection_name=couchbase_collection, -) -set_llm_cache(cache) -print("✓ Couchbase cache enabled!") - -# Test cache benefits with the same query (should show improvement on second run) -cache_query = "How does a distributed database handle high-speed operations?" - -print("\nTesting cache benefits with a different query...") -print("First execution (cache miss):") -cache_time_1, _ = search_with_performance_metrics( - cache_query, "Phase 3a: First Query (Cache Miss)", k=2 -) - -print("\nSecond execution (cache hit):") -cache_time_2, _ = search_with_performance_metrics( - cache_query, "Phase 3b: Repeated Query (Cache Hit)", k=2 -) -``` - - Setting up Couchbase cache for improved performance on repeated queries... - ✓ Couchbase cache enabled! - - Testing cache benefits with a different query... - First execution (cache miss): - - === PHASE 3A: FIRST QUERY (CACHE MISS) === - Query: "How does a distributed database handle high-speed operations?" - Search Time: 0.1024 seconds - Results Found: 2 documents - - [Result 1] - Vector Distance: 0.632770 (lower = more similar) - Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. - - [Result 2] - Vector Distance: 0.677951 (lower = more similar) - Document Content: It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more. - - Second execution (cache hit): - - === PHASE 3B: REPEATED QUERY (CACHE HIT) === - Query: "How does a distributed database handle high-speed operations?" - Search Time: 0.0289 seconds - Results Found: 2 documents - - [Result 1] - Vector Distance: 0.632770 (lower = more similar) - Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. - - [Result 2] - Vector Distance: 0.677951 (lower = more similar) - Document Content: It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more. - - -### Complete Performance Analysis - -Let's analyze the complete performance improvements across all optimization levels: - - -```python -print("\n" + "="*80) -print("VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY") -print("="*80) - -print(f"Phase 1 - Baseline (Raw Search): {baseline_time:.4f} seconds") -print(f"Phase 2 - GSI-Optimized Search: {gsi_time:.4f} seconds") -print(f"Phase 3 - Cache Benefits:") -print(f" First execution (cache miss): {cache_time_1:.4f} seconds") -print(f" Second execution (cache hit): {cache_time_2:.4f} seconds") - -print("\n" + "-"*80) -print("OPTIMIZATION IMPACT ANALYSIS:") -print("-"*80) - -# GSI improvement analysis -if gsi_time and baseline_time and gsi_time < baseline_time: - gsi_speedup = baseline_time / gsi_time - gsi_improvement = ((baseline_time - gsi_time) / baseline_time) * 100 - print(f"GSI Index Benefit: {gsi_speedup:.2f}x faster ({gsi_improvement:.1f}% improvement)") -else: - print(f"GSI Index Benefit: Performance similar to baseline (may vary with dataset size)") - -# Cache improvement analysis -if cache_time_2 and cache_time_1 and cache_time_2 < cache_time_1: - cache_speedup = cache_time_1 / cache_time_2 - cache_improvement = ((cache_time_1 - cache_time_2) / cache_time_1) * 100 - print(f"Cache Benefit: {cache_speedup:.2f}x faster ({cache_improvement:.1f}% improvement)") -else: - print(f"Cache Benefit: No significant improvement (results may be cached already)") - -print(f"\nKey Insights:") -print(f"• GSI optimization provides consistent performance benefits, especially with larger datasets") -print(f"• Caching benefits apply to both raw and GSI-optimized searches") -print(f"• Combined GSI + Cache provides the best performance for production applications") -print(f"• BHIVE indexes scale to billions of vectors with optimized concurrent operations") -``` - - - ================================================================================ - VECTOR SEARCH PERFORMANCE OPTIMIZATION SUMMARY - ================================================================================ - Phase 1 - Baseline (Raw Search): 0.1484 seconds - Phase 2 - GSI-Optimized Search: 0.0848 seconds - Phase 3 - Cache Benefits: - First execution (cache miss): 0.1024 seconds - Second execution (cache hit): 0.0289 seconds - - -------------------------------------------------------------------------------- - OPTIMIZATION IMPACT ANALYSIS: - -------------------------------------------------------------------------------- - GSI Index Benefit: 1.75x faster (42.8% improvement) - Cache Benefit: 3.55x faster (71.8% improvement) - - Key Insights: - • GSI optimization provides consistent performance benefits, especially with larger datasets - • Caching benefits apply to both raw and GSI-optimized searches - • Combined GSI + Cache provides the best performance for production applications - • BHIVE indexes scale to billions of vectors with optimized concurrent operations - - -### Interactive Testing - -Try your own queries with the optimized search system: - - -```python -custom_query = input("Enter your search query: ") -search_with_performance_metrics(custom_query, "Interactive GSI-Optimized Search") - -``` - - - === INTERACTIVE GSI-OPTIMIZED SEARCH === - Query: "What is the sample data?" - Search Time: 0.0812 seconds - Results Found: 3 documents - - [Result 1] - Vector Distance: 0.623644 (lower = more similar) - Document Content: this is a sample text with the data "hello" - - [Result 2] - Vector Distance: 0.860599 (lower = more similar) - Document Content: It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more. - - [Result 3] - Vector Distance: 0.909207 (lower = more similar) - Document Content: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. - - - - - - (0.08118820190429688, - [(Document(id='e20a8dcd8b464e8e819b87c9a0ff05c3', metadata={}, page_content='this is a sample text with the data "hello"'), - 0.6236441411684932), - (Document(id='0442f351aec2415481138315d492ee80', metadata={}, page_content='It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.'), - 0.8605992009935179), - (Document(id='7c601881e4bf4c53b5b4c2a25628d904', metadata={}, page_content='Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.'), - 0.9092065785676496)]) - - - -## Conclusion - -You have successfully built a powerful semantic search engine using Couchbase's GSI vector search capabilities and Hugging Face embeddings. This guide has walked you through the complete process of creating a high-performance vector search system that can scale to handle billions of documents. diff --git a/tutorial/markdown/generated/vector-search-cookbook/memGpt_letta.md b/tutorial/markdown/generated/vector-search-cookbook/memGpt_letta.md deleted file mode 100644 index ac0c8ff..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/memGpt_letta.md +++ /dev/null @@ -1,468 +0,0 @@ ---- -# frontmatter -path: "/tutorial-letta-memgpt-agents" -title: Use Couchbase Vector Store with Letta Agents -short_title: Use Couchbase with Letta -description: - - Learn how to use Letta Agents - - Upload Couchbase Search Tool to Letta - - Letta Agents with Couchbase -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence -sdk_language: - - python -length: 20 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/memGpt_letta/memGpt_letta.ipynb) - -# How to build Agents with Couchbase, LangChain and Letta! - -This repository will illustrate the core concepts behind [Letta](https://www.letta.com/), a new startup built from the success of [MemGPT](https://memgpt.ai/) from researchers at UC Berkeley. - -Letta helps you build Agents with **meta memory**. - -In addition to the prompt sent to an LLM, the LLM will also be provided with a short description of its internal knowledge about the user and their interests derived from previous chats. - -There are 3 parts to this notebook: - -## 1. Getting Started with Letta Agents - -An illustration of the basic setup requirements and a simple example of an Agent with internal memory. - -## 2. Upload Couchbase Search Tool to Letta - -Wrap Couchbase Vector Search in a Python function and upload it to the Letta Tool server. We will use [LangChain](https://python.langchain.com/docs/integrations/providers/couchbase/) to create function which uses Couchbase Vector Store. This tool focuses on providing RAG capabilities for a particular collection. We can create either multiple such functions, or a single function with collection name passed as parameter to offer varied results based on use case. - -## 3. Letta Agent with Couchbase - -Construct a Letta Agent that chats with a climate report indexed in Couchbase. Observe how the Letta Agent develops an internal model of the chat user and the climate report through these interactions. - -# 1. Getting Started with Letta Agents - -### Primer: Install Letta Server - -`$pip install -U letta` - -Start local server: -(You may need to export OpenAI API Key before running this command for this tutorial to work) - -`$letta server` - - - -For more details, checkout ReadMe of the Letta Repo -https://github.com/letta-ai/letta - - - -### Connect to Letta Client - -Connect the Letta Server, the default port to run letta client is 8283. This may change based on your configuration. - - -```python -from letta import create_client - -# connect to the letta server -letta_client = create_client(base_url="http://localhost:8283") - - -``` - - -```python -# get this from your local GUI -my_letta_agent_id = "agent-ba1cee1d-1f31-4619-8ee2-557eb844a063" - -# send a message to the agent -response = letta_client.send_message( - agent_id=my_letta_agent_id, - role="user", - message="By 2100, what is the projected global warming rise possible if no actions are taken?" -) -``` - - -```python -# Print the internal monologue -print("Internal Monologue:") -print(response.messages[0].internal_monologue) -print() - -# Print the message sent -print("Message Sent:") -print(response.messages[1].function_call.arguments) -print() - -# Print the function return status -print("Function Return Status:") -print(f"Status: {response.messages[2].status}") -print(f"Time: {response.messages[2].function_return}") -print() - -# Print usage statistics -print("Usage Statistics:") -print(f"Completion tokens: {response.usage.completion_tokens}") -print(f"Prompt tokens: {response.usage.prompt_tokens}") -print(f"Total tokens: {response.usage.total_tokens}") -print(f"Step count: {response.usage.step_count}") -``` - - Internal Monologue: - His concern about global climate trends is apparent. While providing the information, I must stress the gravity and urgency of environmental preservation. - - Message Sent: - { - "message": "This is indeed an issue of utmost importance, Chad. If no substantial actions are taken to mitigate climate change, scientists predict that global temperatures could increase by more than 4 degrees Celsius by the end of the 21st century compared to pre-industrial levels. However, it's important to note that projections can vary based on different climate models and scenarios. The current consensus underscores the need for immediate and sustained action to limit warming to within manageable levels." - } - - Function Return Status: - Status: success - Time: { - "status": "OK", - "message": "None", - "time": "2024-11-15 10:23:55 AM IST+0530" - } - - Usage Statistics: - Completion tokens: 136 - Prompt tokens: 6199 - Total tokens: 6335 - Step count: 1 - - -# 2. Upload Couchbase Search Tool indexed on Climate Report to Letta - -We are using a publically available IPCC Climate Change Report. You may get it from here: https://www.ipcc.ch/report/ar6/syr/downloads/report/IPCC_AR6_SYR_LongerReport.pdf . - -To learn how to index a PDF to Couchbase to perform vector search, please follow this tutorial, https://developer.couchbase.com/tutorial-python-langchain-pdf-chat - - - -```python -import os - -os.environ["OPENAI_API_KEY"] = "sk-xxxxxxx" -os.environ["CB_CONN_STR"] = "couchbase://localhost" -os.environ["CB_BUCKET"] = "pdf-docs" -os.environ["CB_SCOPE"] = "letta" -os.environ["CB_COLLECTION"] = "climate_report" -os.environ["CB_USERNAME"] = "Administrator" -os.environ["CB_PASSWORD"] = "password" -os.environ["SEARCH_INDEX"] = "climate_index" -``` - - -```python -def ask_climate_search( - self, - search_query: str -): - """ - Search the couchbase database for all the climate related questions from climate report. - It uses the search engine to find the most relevant results related to climate and impacts. - All the climate related questions shoukld be asked using this function. - - Args: - search_query (str): The search query for the climate related questions. - - Returns: - search_results (str): The results from the search engine. - """ - from couchbase.cluster import Cluster, ClusterOptions - from couchbase.auth import PasswordAuthenticator - from datetime import timedelta - from langchain_openai import OpenAIEmbeddings - from langchain_couchbase.vectorstores import CouchbaseSearchVectorStore - import os - from pydantic import SecretStr - - - conn_str = os.getenv("CB_CONN_STR") or "couchbase://localhost" - bucket_name = os.getenv("CB_BUCKET") or "pdf-docs" - scope_name = os.getenv("CB_SCOPE") or "letta" - collection_name = os.getenv("CB_COLLECTION") or "climate_report" - username = os.getenv("CB_USERNAME") or "Administrator" - password = os.getenv("CB_PASSWORD") or "password" - search_index = os.getenv("SEARCH_INDEX") or "climate_index" - openai_api_key = os.getenv("OPENAI_API_KEY") or "sk-xxxxxxx" - - # Connect to the cluster - auth = PasswordAuthenticator(username, password) - options = ClusterOptions(auth) - cluster = Cluster(conn_str, options) - cluster.wait_until_ready(timedelta(seconds=5)) - embedding = OpenAIEmbeddings(api_key=SecretStr(openai_api_key)) - cb_vector_store = CouchbaseSearchVectorStore( - cluster, - bucket_name, - scope_name, - collection_name, - embedding, - search_index, - ) - - query_result = cb_vector_store.similarity_search(search_query, 5) - formatted_results = "\n".join( - [f"[Search Result {i+1}] {str(result.page_content)}" for i, result in enumerate(query_result)] - ) - return formatted_results -``` - - -```python -couchbase_search_tool = letta_client.create_tool( - ask_climate_search, - tags=["search", "climate", "weather", "impact"], -) -``` - - -```python -letta_client.list_tools() -``` - -# 3. Letta Agent with Couchbase - - -```python -from letta import LLMConfig, EmbeddingConfig -# set default llm config for agents -letta_client.set_default_llm_config( - LLMConfig.default_config(model_name="gpt-4o-mini") -) - -# set default embedding config for agents -letta_client.set_default_embedding_config( - EmbeddingConfig.default_config(model_name="text-embedding-ada-002") -) - -agent_state = letta_client.create_agent( - name="climate-agent", - tools=[couchbase_search_tool.name], -) - -new_agent_id = agent_state.id -``` - - -```python -response = letta_client.send_message( - agent_id=new_agent_id, - role="user", - message="By 2100, what is the projected global warming rise possible if no actions are taken?" -) - -print(response) -``` - - { - "messages": [ - { - "id": "message-b8555891-125f-4cb6-9f18-b55c0f7ff246", - "date": "2024-11-15T04:41:41+00:00", - "message_type": "internal_monologue", - "internal_monologue": "The user is curious about the implications of climate inaction. I should provide a well-researched answer about global warming projections." - }, - { - "id": "message-b8555891-125f-4cb6-9f18-b55c0f7ff246", - "date": "2024-11-15T04:41:41+00:00", - "message_type": "function_call", - "function_call": { - "name": "ask_climate_search", - "arguments": "{\n \"search_query\": \"global warming rise by 2100\",\n \"request_heartbeat\": true\n}", - "function_call_id": "call_e9JFmCstpfODePsSUaOf8tZA" - } - }, - { - "id": "message-bf02e7e0-3fe7-4f2e-8bc2-2ce5df10e7d0", - "date": "2024-11-15T04:41:41+00:00", - "message_type": "function_return", - "function_return": "{\n \"status\": \"OK\",\n \"message\": \"[Search Result 1] 98\\nSection 4\\nSection 1Section 4\\nGlobal warming will continue to increase in the near term (2021–2040) \\nmainly due to increased cumulative CO 2 emissions in nearly all \\nconsidered scenarios and pathways. In the near term, every \\nregion in the world is projected to face further increases in \\nclimate hazards ( medium to high confidence, depending on \\nregion and hazard), increasing multiple risks to ecosystems \\nand humans ( very high confidence ). In the near term, natural \\nvariability149 will modulate human-caused changes, either attenuating \\nor amplifying projected changes, especially at regional scales, with little \\neffect on centennial global warming. Those modulations are important \\nto consider in adaptation planning. Global surface temperature in any \\nsingle year can vary above or below the long-term human-induced \\ntrend, due to natural variability. By 2030, global surface temperature \\nin any individual year could exceed 1.5°C relative to 1850–1900 with a \\nprobability between 40% and 60%, across the five scenarios assessed \\nin WGI (medium confidence ). The occurrence of individual years with \\nglobal surface temperature change above a certain level does not \\nimply that this global warming level has been reached. If a large \\nexplosive volcanic eruption were to occur in the near term 150 , it \\nwould temporarily and partially mask human-caused climate change \\nby reducing global surface temperature and precipitation, especially\\n[Search Result 2] Global warming will continue to increase in the near term in \\nnearly all considered scenarios and modelled pathways. Deep, \\nrapid, and sustained GHG emissions reductions, reaching net \\nzero CO2 emissions and including strong emissions reductions \\nof other GHGs, in particular CH4, are necessary to limit warming \\nto 1.5°C (>50%) or less than 2°C (>67%) by the end of century \\n(high confidence ). The best estimate of reaching 1.5°C of global \\nwarming lies in the first half of the 2030s in most of the considered \\nscenarios and modelled pathways114. In the very low GHG emissions \\nscenario ( SSP1-1.9), CO2 emissions reach net zero around 2050 and the \\nbest-estimate end-of-century warming is 1.4°C, after a temporary overshoot \\n(see Section 3.3.4) of no more than 0.1°C above 1.5°C global warming. \\nGlobal warming of 2°C will be exceeded during the 21st century unless \\ndeep reductions in CO2 and other GHG emissions occur in the coming \\ndecades. Deep, rapid, and sustained reductions in GHG emissions would \\nlead to improvements in air quality within a few years, to reductions in \\ntrends of global surface temperature discernible after around 20 years, \\nand over longer time periods for many other climate impact-drivers 115 \\n(high confidence ). Targeted reductions of air pollutant emissions lead \\nto more rapid improvements in air quality compared to reductions \\nin GHG emissions only, but in the long term, further improvements are\\n[Search Result 3] and risks for coastal ecosystems, people and infrastructure will continue \\nto increase beyond 2100 ( high confidence ). At sustained warming \\nlevels between 2°C and 3°C, the Greenland and West Antarctic ice \\nsheets will be lost almost completely and irreversibly over multiple \\nmillennia (limited evidence). The probability and rate of ice mass loss \\nincrease with higher global surface temperatures ( high confidence ). \\nOver the next 2000 years, global mean sea level will rise by about \\n2 to 3 m if warming is limited to 1.5°C and 2 to 6 m if limited to 2°C \\n(low confidence). Projections of multi-millennial global mean sea level \\nrise are consistent with reconstructed levels during past warm climate \\nperiods: global mean sea level was very likely 5 to 25 m higher than today \\nroughly 3 million years ago, when global temperatures were 2.5°C to \\n4°C higher than 1850–1900 ( medium confidence ). Further examples \\nof unavoidable changes in the climate system due to multi-decadal \\nor longer response timescales include continued glacier melt ( very high \\nconfidence) and permafrost carbon loss (high confidence). {WGI SPM B.5.2, \\nWGI SPM B.5.3, WGI SPM B.5.4, WGI SPM C.2.5, WGI Box TS.4, \\nWGI Box TS.9, WGI 9.5.1; WGII TS C.5; SROCC SPM B.3, SROCC SPM B.6, \\nSROCC SPM B.9} (Figure 3.4)\\nThe probability of low- likelihood outcomes associated with \\npotentially very large impacts increases with higher global \\nwarming levels (high confidence). Warming substantially above the\\n[Search Result 4] potentially very large impacts increases with higher global \\nwarming levels (high confidence). Warming substantially above the \\nassessed very likely range for a given scenario cannot be ruled out, and \\nthere is high confidence this would lead to regional changes greater \\nthan assessed in many aspects of the climate system. Low-likelihood, \\nhigh-impact outcomes could occur at regional scales even for global warming \\nwithin the very likely assessed range for a given GHG emissions scenario. \\nGlobal mean sea level rise above the l i k e l y range – approaching 2 m by \\n2100 and in excess of 15 m by 2300 under a very high GHG emissions \\nscenario (SSP5-8.5) (low confidence ) – cannot be ruled out due to \\ndeep uncertainty in ice-sheet processes 123 and would have severe\\n[Search Result 5] livelihoods, food security, water supply, human security, and economic \\ngrowth are projected to increase with global warming of 1 .5°C. In \\nterrestrial ecosystems, 3 to 14% of the tens of thousands of species \\nassessed will likely face a very high risk of extinction at a GWL of 1.5°C. \\nCoral reefs are projected to decline by a further 70–90% at 1.5°C of \\nglobal warming ( high confidence ). At this GWL, many low-elevation \\nand small glaciers around the world would lose most of their mass or \\ndisappear within decades to centuries (high confidence ). Regions at \\ndisproportionately higher risk include Arctic ecosystems, dryland regions, \\nsmall island developing states and Least Developed Countries ( high \\nconfidence). {WGII SPM B.3, WGII SPM B.4.1, WGII TS.C.4.2; SR1.5 SPM A... [NOTE: function output was truncated since it exceeded the character limit (6571 > 6000)]\",\n \"time\": \"2024-11-15 10:11:41 AM IST+0530\"\n}", - "status": "success", - "function_call_id": "call_e9JFmCstpfODePsSUaOf8tZA" - }, - { - "id": "message-a1f944bf-b739-478f-a765-e4776401d1cf", - "date": "2024-11-15T04:41:44+00:00", - "message_type": "internal_monologue", - "internal_monologue": "I found detailed information about projected global warming. Time to summarize it clearly for Chad." - }, - { - "id": "message-a1f944bf-b739-478f-a765-e4776401d1cf", - "date": "2024-11-15T04:41:44+00:00", - "message_type": "function_call", - "function_call": { - "name": "send_message", - "arguments": "{\n \"message\": \"If no significant actions are taken, global warming is projected to exceed 2°C above pre-industrial levels by 2100, potentially reaching between 2°C and 4°C in worst-case scenarios. This would lead to severe impacts on ecosystems, sea-level rise, and increased climate-related hazards.\"\n}", - "function_call_id": "call_vglAIXtXhneq0u7btRNYrjGo" - } - }, - { - "id": "message-62b9f864-95b8-4ef8-b98d-066bd06fb7e9", - "date": "2024-11-15T04:41:44+00:00", - "message_type": "function_return", - "function_return": "{\n \"status\": \"OK\",\n \"message\": \"None\",\n \"time\": \"2024-11-15 10:11:44 AM IST+0530\"\n}", - "status": "success", - "function_call_id": "call_vglAIXtXhneq0u7btRNYrjGo" - } - ], - "usage": { - "completion_tokens": 154, - "prompt_tokens": 6708, - "total_tokens": 6862, - "step_count": 2 - } - } - - - -```python -response = letta_client.send_message( - agent_id=new_agent_id, - role="user", - message="Based on the report, Does greenhouse gases emission depend on income level?" -) - -print(response) -``` - - { - "messages": [ - { - "id": "message-225864f8-b60e-4202-b61b-7cd2b88c0a9c", - "date": "2024-11-15T04:41:45+00:00", - "message_type": "internal_monologue", - "internal_monologue": "Chad is asking an insightful question about the relationship between income levels and emissions. I need to find relevant information." - }, - { - "id": "message-225864f8-b60e-4202-b61b-7cd2b88c0a9c", - "date": "2024-11-15T04:41:45+00:00", - "message_type": "function_call", - "function_call": { - "name": "ask_climate_search", - "arguments": "{\n \"search_query\": \"greenhouse gas emissions and income level\",\n \"request_heartbeat\": true\n}", - "function_call_id": "call_fU0IqHN9balZ2nfgnROyXtAn" - } - }, - { - "id": "message-dda36b14-49b9-485b-865d-a7d5ed8e186f", - "date": "2024-11-15T04:41:46+00:00", - "message_type": "function_return", - "function_return": "{\n \"status\": \"OK\",\n \"message\": \"[Search Result 1] processes (CO2-FFI), due to improvements in energy intensity of GDP \\nand carbon intensity of energy, have been less than emissions increases \\nfrom rising global activity levels in industry, energy supply, transport, \\nagriculture and buildings. The 10% of households with the highest per \\ncapita emissions contribute 34– 45% of global consumption-based \\nhousehold GHG emissions, while the middle 40% contribute 40–53%, \\nand the bottom 50% contribute 13–15%. An increasing share of \\nemissions can be attributed to urban areas (a rise from about 62% \\nto 67–72% of the global share between 2015 and 2020). The drivers \\nof urban GHG emissions 73 are complex and include population size , \\nincome, state of urbanisation and urban form. ( high confidence ) \\n{WGIII SPM B.2, WGIII SPM B.2.3, WGIII SPM B.3.4, WGIII SPM D.1.1}\\n[Search Result 2] high socio-economic status contribute disproportionately to emissions, \\nand have the highest potential for emissions reductions, e.g., as \\ncitizens, investors, consumers, role models, and professionals ( high \\nconfidence). There are options on design of instruments such as taxes, \\nsubsidies, prices, and consumption-based approaches, complemented \\nby regulatory instruments to reduce high-emissions consumption while \\nimproving equity and societal well-being (high confidence). Behaviour \\nand lifestyle changes to help end-users adopt low-GHG-intensive \\noptions can be supported by policies, infrastructure and technology \\nwith multiple co- benefits for societal well-being ( high confidence ). \\nBroadening equitable access to domestic and international finance, \\ntechnologies and capacity can also act as a catalyst for accelerating \\nmitigation and shifting development pathways in low-income contexts \\n(high confidence ). Eradicating extreme poverty, energy poverty, and \\nproviding decent living standards to all in these regions in the context of \\nachieving sustainable development objectives, in the near term, can be \\nachieved without significant global emissions growth (high confidence). \\nTechnology development, transfer, capacity building and financing can \\nsupport developing countries/ regions leapfrogging or transitioning to \\nlow-emissions transport systems thereby providing multiple co-benefits \\n(high confidence ). Climate resilient development is advanced when\\n[Search Result 3] over 20% of global GHG emissions were covered by carbon taxes or \\nemissions trading systems, although coverage and prices have been \\ninsufficient to achieve deep reductions (medium confidence). Equity and \\ndistributional impacts of carbon pricing instruments can be addressed \\nby using revenue from carbon taxes or emissions trading to support \\nlow-income households, among other approaches (high confidence ). \\nThe mix of policy instruments which reduced costs and stimulated \\nadoption of solar energy, wind energy and lithium-ion batteries \\nincludes public R&D, funding for demonstration and pilot projects, and \\ndemand-pull instruments such as deployment subsidies to attain scale \\n(high confidence ) ( Figure 2.4). { WGIII SPM B.4.1, WGIII SPM B.5.2, \\nWGIII SPM E.4.2, WG III TS.3} \\nMitigation actions, supported by policies, have contributed \\nto a decrease in global energy and carbon intensity between \\n2010 and 2019, with a growing number of countries achieving \\nabsolute GHG emission reductions for more than a decade (high \\nconfidence). While global net GHG emissions have increased since \\n2010, global energy intensity (total primary energy per unit GDP) \\ndecreased by 2% yr –1 between 2010 and 2019. Global carbon \\nintensity (CO2-FFI per unit primary energy) also decreased by 0.3% \\nyr–1, mainly due to fuel switching from coal to gas, reduced expansion \\nof coal capacity, and increased use of renewables, and with large \\nregional variations over the same period. In many countries, policies\\n[Search Result 4] higher CO2-LULUCF emissions from a forest and peat fire event in South East Asia. Regions are as grouped in Annex II of WGIII. Panel (d) shows population, gross domestic product \\n(GDP) per person, emission indicators by region in 2019 for total GHG per person, and total GHG emissions intensity, together with production-based and consumption-based CO2-FFI data, \\nwhich is assessed in this report up to 2018. Consumption-based emissions are emissions released to the atmosphere in order to generate the goods and services consumed by a \\ncertain entity (e.g., region). Emissions from international aviation and shipping are not included. {WGIII Figure SPM.2}\\n2.1.2. Observed Climate System Changes and Impacts to \\nDate\\nIt is unequivocal that human influence has warmed the \\natmosphere, ocean and land. Widespread and rapid changes in \\nthe atmosphere, ocean, cryosphere and biosphere have occurred \\n(Table 2.1). The scale of recent changes across the climate system as \\na whole and the present state of many aspects of the climate system \\nare unprecedented over many centuries to many thousands of years. It \\nis very likely that GHG emissions were the main driver74 of tropospheric \\nwarming and extremely likely that human-caused stratospheric ozone \\ndepletion was the main driver of stratospheric cooling between 1979 \\nand the mid-1990s. It is virtually certain that the global upper ocean \\n(0-700m) has warmed since the 1970s and extremely likely that\\n[Search Result 5] in regional and national per capita emissions partly reflect different \\ndevelopment stages, but they also vary widely at similar income \\nlevels. Average per capita net anthropogenic GHG emissions in 2019 \\nranged from 2.6 tCO 2-eq to 19 tCO 2-eq across regions (Figure 2.2). \\nLeast Developed Countries (LDCs) and Small Island Developing States (SIDS) \\nhave much lower per capita emissions (1.7 tCO 2-eq and 4.6 tCO 2-eq, \\nrespectively) than the global average (6.9 tCO 2-eq), excluding \\nCO2-LULUCF . Around 48% of the global population in 2019 lives in countries \\nemitting on average more than 6 tCO2-eq per capita, 35% of the global \\npopulation live in countries emitting mor... [NOTE: function output was truncated since it exceeded the character limit (6717 > 6000)]\",\n \"time\": \"2024-11-15 10:11:46 AM IST+0530\"\n}", - "status": "success", - "function_call_id": "call_fU0IqHN9balZ2nfgnROyXtAn" - }, - { - "id": "message-ddda96a6-50d7-4311-95fc-91235550f1ce", - "date": "2024-11-15T04:41:48+00:00", - "message_type": "internal_monologue", - "internal_monologue": "I found relevant data showing the correlation between income levels and emissions. Now I need to convey this clearly." - }, - { - "id": "message-ddda96a6-50d7-4311-95fc-91235550f1ce", - "date": "2024-11-15T04:41:48+00:00", - "message_type": "function_call", - "function_call": { - "name": "send_message", - "arguments": "{\n \"message\": \"Yes, greenhouse gas emissions do depend on income levels. Wealthier households and regions typically contribute disproportionately to emissions due to higher consumption patterns. In contrast, lower-income areas often exhibit significantly lower emissions, although they may still face the brunt of climate impacts. Policies targeting equity can help reduce emissions while improving living standards.\"\n}", - "function_call_id": "call_BZZFO2SA5QGRq0SYJPATywVR" - } - }, - { - "id": "message-66b27339-dd39-4611-8d99-8cf1dc3521d0", - "date": "2024-11-15T04:41:48+00:00", - "message_type": "function_return", - "function_return": "{\n \"status\": \"OK\",\n \"message\": \"None\",\n \"time\": \"2024-11-15 10:11:48 AM IST+0530\"\n}", - "status": "success", - "function_call_id": "call_BZZFO2SA5QGRq0SYJPATywVR" - } - ], - "usage": { - "completion_tokens": 160, - "prompt_tokens": 10536, - "total_tokens": 10696, - "step_count": 2 - } - } - - - -```python -response = letta_client.send_message( - agent_id=new_agent_id, - role="user", - message="What are some high confidence methods which can help in reversing climate change?" -) - -print(response) -``` - - { - "messages": [ - { - "id": "message-8ccb5cbb-2e22-47a2-86a5-fa10e3554e1e", - "date": "2024-11-15T04:41:50+00:00", - "message_type": "internal_monologue", - "internal_monologue": "Chad is looking for effective methods to combat climate change. I need to gather reliable information about high-confidence strategies." - }, - { - "id": "message-8ccb5cbb-2e22-47a2-86a5-fa10e3554e1e", - "date": "2024-11-15T04:41:50+00:00", - "message_type": "function_call", - "function_call": { - "name": "ask_climate_search", - "arguments": "{\n \"search_query\": \"methods to reverse climate change\",\n \"request_heartbeat\": true\n}", - "function_call_id": "call_c1XF43X8Aazdaa818uAQlHQw" - } - }, - { - "id": "message-865a92bd-ae87-408f-bfea-3788f74dfba6", - "date": "2024-11-15T04:41:51+00:00", - "message_type": "function_return", - "function_return": "{\n \"status\": \"OK\",\n \"message\": \"[Search Result 1] restoration of wetlands and upstream forest ecosystems reduce \\na range of climate change risks, including flood risks, urban heat \\nand provide multiple co-benefits. Some land-based adaptation \\noptions provide immediate benefits (e.g., conservation of peatlands,\\n[Search Result 2] 107\\nNear-Term Responses in a Changing Climate\\nSection 4\\nImproved access to clean energy sources and technologies, and shifts \\nto active mobility (e.g., walking and cycling) and public transport can \\ndeliver socioeconomic, air quality and health benefits, especially \\nfor women and children ( high confidence ). {WGII SPM C.2.2, WGII \\nSPM C.2.11, WGII Cross-Chapter Box HEALTH; WGIII SPM C.2.2, \\nWGIII SPM C.4.2, WGIII SPM C.9.1, WGIII SPM C.10.4, WGIII SPM \\nD.1.3, WGIII Figure SPM.6, WGIII Figure SPM.8; SRCCL SPM B.6.2, \\nSRCCL SPM B.6.3, SRCCL B.4.6, SRCCL SPM C.2.4 }\\nEffective adaptation options exist to help protect human health \\nand well-being (high confidence ). Health Action Plans that include \\nearly warning and response systems are effective for extreme heat (high \\nconfidence). Effective options for water-borne and food-borne diseases \\ninclude improving access to potable water, reducing exposure of water and \\nsanitation systems to flooding and extreme weather events, and improved \\nearly warning systems (very high confidence). For vector-borne diseases, \\neffective adaptation options include surveillance, early warning \\nsystems, and vaccine development ( very high confidence ). Effective \\nadaptation options for reducing mental health risks under climate \\nchange include improving surveillance and access to mental health \\ncare, and monitoring of psychosocial impacts from extreme weather \\nevents ( high confidence ). A key pathway to climate resilience in the\\n[Search Result 3] and ambition of climate governance (medium confidence). Engaging \\nIndigenous Peoples and local communities using just- transition and \\nrights-based decision-making approaches, implemented through \\ncollective and participatory decision-making processes has enabled \\ndeeper ambition and accelerated action in different ways, and at all \\nscales, depending on national circumstances ( medium confidence ). \\nThe media helps shape the public discourse about climate change. This \\ncan usefully build public support to accelerate climate action ( medium \\nevidence, high agreement ). In s o m e i n s t a n c e s , public discourses of \\nmedia and organised counter movements have impeded climate \\naction, exacerbating helplessness and disinformation and fuelling \\npolarisation, with negative implications for climate action ( medium \\nconfidence). {WGII SPM C.5.1, WGII SPM D.2, WGII TS.D.9, WGII TS.D.9.7, \\nWGII TS.E.2.1, WGII 18.4; WGIII SPM D.3.3, WGIII SPM E.3.3, WGIII TS.6.1, \\nWGIII 6.7, WGIII 13 ES, WGIII Box.13.7}\\n2.2.2. Mitigation Actions to Date\\nThere has been a consistent expansion of policies and laws \\naddressing mitigation since AR5 (high confidence ). Climate \\ngovernance supports mitigation by providing frameworks through \\nwhich diverse actors interact, and a basis for policy development and \\nimplementation (medium confidence ). Many regulatory and economic \\ninstruments have already been deployed successfully (high confidence).\\n[Search Result 4] 87\\nLong-Term Climate and Development Futures\\nSection 3\\nfrom zero or low-carbon sources in 2050, such as renewables or \\nfossil fuels with CO 2 capture and storage, combined with increased \\nelectrification of energy demand. Such pathways meet energy service \\ndemand with relatively low energy use, through e.g., enhanced energy \\nefficiency and behavioural changes and increased electrification of \\nenergy end use. Modelled global pathways limiting global warming to \\n1.5°C (>50%) with no or limited overshoot generally implement such \\nchanges faster than pathways limiting global warming to 2°C (> 67%). \\n(high confidence ) {WGIII SPM C.3, WGIII SPM C.3.2, WGIII SPM C.4, \\nWGIII TS.4.2; SR1.5 SPM C.2.2}\\nAFOLU mitigation options, when sustainably implemented, can \\ndeliver large-scale GHG emission reductions and enhanced CO 2 \\nremoval; however, barriers to implementation and trade-offs \\nmay result from the impacts of climate change, competing \\ndemands on land, conflicts with food security and livelihoods, \\nthe complexity of land ownership and management systems, \\nand cultural aspects (see 3.4.1). A l l a s s e s s e d m o d e l l e d pathways \\nthat limit warming to 2°C (>67%) or lower by 2100 include land-based \\nmitigation and land-use change, with most including different \\ncombinations of reforestation, afforestation, reduced deforestation, and \\nbioenergy. However, accumulated carbon in vegetation and soils is at \\nrisk from future loss (or sink reversal) triggered by climate change and\\n[Search Result 5] 91\\nSection 4\\nNear-Term Responses \\nin a Changing Climate\",\n \"time\": \"2024-11-15 10:11:51 AM IST+0530\"\n}", - "status": "success", - "function_call_id": "call_c1XF43X8Aazdaa818uAQlHQw" - }, - { - "id": "message-a3990cb7-fa05-4c7b-91fb-c896d72d7c7f", - "date": "2024-11-15T04:41:53+00:00", - "message_type": "internal_monologue", - "internal_monologue": "I've identified several reliable methods to help reverse climate change. I need to list them clearly for Chad." - }, - { - "id": "message-a3990cb7-fa05-4c7b-91fb-c896d72d7c7f", - "date": "2024-11-15T04:41:53+00:00", - "message_type": "function_call", - "function_call": { - "name": "send_message", - "arguments": "{\n \"message\": \"High-confidence methods to reverse climate change include: 1) **Restoration of ecosystems** such as wetlands and forests; 2) **Transition to clean energy sources** and electrification; 3) **Enhancing energy efficiency** and adopting sustainable land-use practices; 4) **Utilizing carbon capture and storage** technologies. These strategies can significantly contribute to reducing greenhouse gases and enhancing carbon sinks.\"\n}", - "function_call_id": "call_DfTWwqfiU9MwOsWn5LNM5Z4d" - } - }, - { - "id": "message-0d684e7b-5948-4998-80a9-4a26f20a6fff", - "date": "2024-11-15T04:41:53+00:00", - "message_type": "function_return", - "function_return": "{\n \"status\": \"OK\",\n \"message\": \"None\",\n \"time\": \"2024-11-15 10:11:53 AM IST+0530\"\n}", - "status": "success", - "function_call_id": "call_DfTWwqfiU9MwOsWn5LNM5Z4d" - } - ], - "usage": { - "completion_tokens": 175, - "prompt_tokens": 14071, - "total_tokens": 14246, - "step_count": 2 - } - } - diff --git a/tutorial/markdown/generated/vector-search-cookbook/mistralai.md b/tutorial/markdown/generated/vector-search-cookbook/mistralai.md deleted file mode 100644 index d85dd80..0000000 --- a/tutorial/markdown/generated/vector-search-cookbook/mistralai.md +++ /dev/null @@ -1,234 +0,0 @@ ---- -# frontmatter -path: "/tutorial-mistralai-couchbase-vector-search" -title: Using Mistral AI Embeddings with Couchbase Vector Search -short_title: Mistral AI with Couchbase Vector Search -description: - - Learn how to generate embeddings using Mistral AI and store them in Couchbase. - - This tutorial demonstrates how to use Couchbase's vector search capabilities with Mistral AI embeddings. - - You'll understand how to perform vector search to find relevant documents based on similarity. -content_type: tutorial -filter: sdk -technology: - - vector search -tags: - - Artificial Intelligence - - Mistral AI -sdk_language: - - python -length: 30 Mins ---- - - - - - -[View Source](https://github.com/couchbase-examples/vector-search-cookbook/tree/main/mistralai/mistralai.ipynb) - -# Introduction - -Couchbase is a NoSQL distributed document database (JSON) with many of the best features of a relational DBMS: SQL, distributed ACID transactions, and much more. [Couchbase Capella™](https://cloud.couchbase.com/sign-up) is the easiest way to get started, but you can also download and run [Couchbase Server](http://couchbase.com/downloads) on-premises. - -Mistral AI is a research lab building the best open source models in the world. La Plateforme enables developers and enterprises to build new products and applications, powered by Mistral’s open source and commercial LLMs. - -The [Mistral AI APIs](https://console.mistral.ai/) empower LLM applications via: - -- [Text generation](https://docs.mistral.ai/capabilities/completion/), enables streaming and provides the ability to display partial model results in real-time -- [Code generation](https://docs.mistral.ai/capabilities/code_generation/), enpowers code generation tasks, including fill-in-the-middle and code completion -- [Embeddings](https://docs.mistral.ai/capabilities/embeddings/), useful for RAG where it represents the meaning of text as a list of numbers -- [Function calling](https://docs.mistral.ai/capabilities/function_calling/), enables Mistral models to connect to external tools -- [Fine-tuning](https://docs.mistral.ai/capabilities/finetuning/), enables developers to create customized and specilized models -- [JSON mode](https://docs.mistral.ai/capabilities/json_mode/), enables developers to set the response format to json_object -- [Guardrailing](https://docs.mistral.ai/capabilities/guardrailing/), enables developers to enforce policies at the system level of Mistral models - - -# How to run this tutorial - -This tutorial is available as a Jupyter Notebook (`.ipynb` file) that you can run interactively. You can access the original notebook [here](https://github.com/couchbase-examples/vector-search-cookbook/blob/main/mistralai/mistralai.ipynb). - -You can either download the notebook file and run it on [Google Colab](https://colab.research.google.com/) or run it on your system by setting up the Python environment. - -# Before you start - -## Get Credentials for Mistral AI - -Please follow the [instructions](https://console.mistral.ai/api-keys/) to generate the Mistral AI credentials. - -## Create and Deploy Your Free Tier Operational cluster on Capella - -To get started with Couchbase Capella, create an account and use it to deploy a forever free tier operational cluster. This account provides you with a environment where you can explore and learn about Capella with no time constraint. - -To know more, please follow the [instructions](https://docs.couchbase.com/cloud/get-started/create-account.html). - -### Couchbase Capella Configuration - -When running Couchbase using [Capella](https://cloud.couchbase.com/sign-in), the following prerequisites need to be met. - -* Create the [database credentials](https://docs.couchbase.com/cloud/clusters/manage-database-users.html) to access the travel-sample bucket (Read and Write) used in the application. -* [Allow access](https://docs.couchbase.com/cloud/clusters/allow-ip-address.html) to the Cluster from the IP on which the application is running. - -# Install necessary libraries - - -```python -!pip install couchbase==4.3.5 mistralai==1.7.0 -``` - - [Output too long, omitted for brevity] - -# Imports - - -```python -from pathlib import Path -from datetime import timedelta -from mistralai import Mistral -from couchbase.auth import PasswordAuthenticator -from couchbase.cluster import Cluster -from couchbase.options import (ClusterOptions, ClusterTimeoutOptions, - QueryOptions) -import couchbase.search as search -from couchbase.options import SearchOptions -from couchbase.vector_search import VectorQuery, VectorSearch -import uuid -``` - -# Prerequisites - - - -```python -import getpass -couchbase_cluster_url = input("Cluster URL:") -couchbase_username = input("Couchbase username:") -couchbase_password = getpass.getpass("Couchbase password:") -couchbase_bucket = input("Couchbase bucket:") -couchbase_scope = input("Couchbase scope:") -couchbase_collection = input("Couchbase collection:") -``` - - Cluster URL: localhost - Couchbase username: Administrator - Couchbase password: ········ - Couchbase bucket: mistralai - Couchbase scope: _default - Couchbase collection: mistralai - - -# Couchbase Connection - - -```python -auth = PasswordAuthenticator( - couchbase_username, - couchbase_password -) -``` - - -```python -cluster = Cluster(couchbase_cluster_url, ClusterOptions(auth)) -cluster.wait_until_ready(timedelta(seconds=5)) - -bucket = cluster.bucket(couchbase_bucket) -scope = bucket.scope(couchbase_scope) -collection = scope.collection(couchbase_collection) -``` - -# Creating Couchbase Vector Search Index -In order to store Mistral embeddings onto a Couchbase Cluster, a vector search index needs to be created first. We included a sample index definition that will work with this tutorial in the `mistralai_index.json` file. The definition can be used to create a vector index using Couchbase server web console, on more information on vector indexes, please read [Create a Vector Search Index with the Server Web Console](https://docs.couchbase.com/server/current/vector-search/create-vector-search-index-ui.html). - - -```python -search_index_name = couchbase_bucket + "._default.vector_test" -search_index = cluster.search_indexes().get_index(search_index_name) -``` - -# Mistral Connection - - -```python -MISTRAL_API_KEY = getpass.getpass("Mistral API Key:") -mistral_client = Mistral(api_key=MISTRAL_API_KEY) -``` - -# Embedding Documents -Mistral client can be used to generate vector embeddings for given text fragments. These embeddings represent the sentiment of corresponding fragments and can be stored in Couchbase for further retrieval. A custom embedding text can also be added into the embedding texts array by running this code block: - - -```python -texts = [ - "Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable.", - "It’s used across industries for things like user profiles, dynamic product catalogs, GenAI apps, vector search, high-speed caching, and much more.", - input("custom embedding text") -] -embeddings = mistral_client.embeddings.create( - model="mistral-embed", - inputs=texts, -) - -print("Output embeddings: " + str(len(embeddings.data))) -``` - -The output `embeddings` is an EmbeddingResponse object with the embeddings and the token usage information: - -``` -EmbeddingResponse( - id='eb4c2c739780415bb3af4e47580318cc', object='list', data=[ - Data(object='embedding', embedding=[-0.0165863037109375,...], index=0), - Data(object='embedding', embedding=[-0.0234222412109375,...], index=1)], - Data(object='embedding', embedding=[-0.0466222735279375,...], index=2)], - model='mistral-embed', usage=EmbeddingResponseUsage(prompt_tokens=15, total_tokens=15) -) -``` - -# Storing Embeddings in Couchbase -Each embedding needs to be stored as a couchbase document. According to provided search index, embedding vector values need to be stored in the `vector` field. The original text of the embedding can be stored in the same document: - - -```python -for i in range(0, len(texts)): - doc = { - "id": str(uuid.uuid4()), - "text": texts[i], - "vector": embeddings.data[i].embedding, - } - collection.upsert(doc["id"], doc) -``` - -# Searching For Embeddings -Stored in Couchbase embeddings later can be searched using the vector index to, for example, find text fragments that would be the most relevant to some user-entered prompt: - - -```python -search_embedding = mistral_client.embeddings.create( - model="mistral-embed", - inputs=["name a multipurpose database with distributed capability"], -).data[0] - -search_req = search.SearchRequest.create(search.MatchNoneQuery()).with_vector_search( - VectorSearch.from_vector_query( - VectorQuery( - "vector", search_embedding.embedding, num_candidates=1 - ) - ) -) -result = scope.search( - "vector_test", - search_req, - SearchOptions( - limit=13, - fields=["vector", "id", "text"] - ) -) -for row in result.rows(): - print("Found answer: " + row.id + "; score: " + str(row.score)) - doc = collection.get(row.id) - print("Answer text: " + doc.value["text"]) - - -``` - - Found answer: 7a4c24dd-393f-4f08-ae42-69ea7009dcda; score: 1.7320726542316662 - Answer text: Couchbase Server is a multipurpose, distributed database that fuses the strengths of relational databases such as SQL and ACID transactions with JSON’s versatility, with a foundation that is extremely fast and scalable. -