# CouchbaseStorage for CrewAI

This notebook demonstrates how to use Couchbase as a vector store for CrewAI's memory system. The implementation provides:
- Vector similarity search for semantic document retrieval
- Document storage with metadata and embeddings
- Comprehensive error handling
- Integration with CrewAI agents

## Prerequisites

Before running this notebook, ensure you have:
1. A running Couchbase cluster (local or Capella)
2. OpenAI API key for generating embeddings
3. Vector search index configured in Couchbase

Set up your environment variables in a .env file:
```
OPENAI_API_KEY=your_key_here
CB_USERNAME=your_username
CB_PASSWORD=your_password
CB_HOST=your_host
CB_BUCKET_NAME=your_bucket
```

## Setup

First, let's install the required packages:

In [1]:
%pip install --quiet 'crewai[tools]' langchain-couchbase langchain-openai python-dotenv

Note: you may need to restart the kernel to use updated packages.


## Implementation

Import required libraries and configure logging:

In [2]:
import logging
import os
from typing import Any, Dict, List, Optional

from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.options import ClusterOptions
from crewai.memory.storage.rag_storage import RAGStorage
from dotenv import load_dotenv
from langchain_couchbase.vectorstores import CouchbaseVectorStore
from langchain_openai import ChatOpenAI, OpenAIEmbeddings

from crewai import Agent, Crew, Process, Task

# Configure logging
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s [%(levelname)s] %(message)s',
    datefmt='%Y-%m-%d %H:%M:%S'
)
logger = logging.getLogger(__name__)

# Disable all logging except our own
for name in logging.root.manager.loggerDict:
    if name != __name__:
        logging.getLogger(name).setLevel(logging.WARNING)

## CouchbaseStorage Class

The CouchbaseStorage class extends CrewAI's RAGStorage to provide vector search capabilities:

In [3]:
class CouchbaseStorage(RAGStorage):
    """Extends Storage to handle embeddings for memory entries using Couchbase."""

    def __init__(self, type, allow_reset=True, embedder_config=None, crew=None):
        try:
            super().__init__(type, allow_reset, embedder_config, crew)
            self._initialize_app()
            logger.info(f"CouchbaseStorage initialized for type: {type}")
        except Exception as e:
            logger.error(f"Failed to initialize CouchbaseStorage: {str(e)}")
            raise

    def search(
        self,
        query: str,
        limit: int = 3,
        filter: Optional[dict] = None,
        score_threshold: float = 0,
    ) -> List[Any]:
        """Search memory entries using vector similarity."""
        try:
            results = self.vector_store.similarity_search_with_score(
                query,
                k=limit,
                filter=filter,
                score_threshold=score_threshold
            )
            
            return [{
                "id": str(i),
                "metadata": doc.metadata,
                "context": doc.page_content,
                "score": score
            } for i, (doc, score) in enumerate(results)]
        except Exception as e:
            logger.error(f"Search failed: {str(e)}")
            raise

    def reset(self) -> None:
        """Reset the memory storage."""
        if self.allow_reset:
            try:
                # Create primary index if it doesn't exist
                self.cluster.query(
                    f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{self.bucket_name}`.`{self.scope_name}`.`{self.collection_name}`"
                ).execute()
                
                # Delete all documents
                self.cluster.query(
                    f"DELETE FROM `{self.bucket_name}`.`{self.scope_name}`.`{self.collection_name}`"
                ).execute()
                logger.info(f"Successfully reset collection: {self.collection_name}")
            except Exception as e:
                logger.error(f"Reset failed: {str(e)}")
                raise

    def _initialize_app(self):
        """Initialize Couchbase client and vector store."""
        try:
            # Check for required environment variables
            if not os.getenv('OPENAI_API_KEY'):
                raise ValueError("OPENAI_API_KEY environment variable is required")

            # Initialize OpenAI embeddings
            self.embeddings = OpenAIEmbeddings(
                openai_api_key=os.getenv('OPENAI_API_KEY'),
                model="text-embedding-ada-002"
            )

            # Connect to Couchbase
            auth = PasswordAuthenticator(
                os.getenv('CB_USERNAME', 'Administrator'),
                os.getenv('CB_PASSWORD', 'password')
            )
            self.cluster = Cluster(
                os.getenv('CB_HOST', 'couchbase://localhost'),
                ClusterOptions(auth)
            )
            
            # Set up bucket, scope, and collection names
            self.bucket_name = os.getenv('CB_BUCKET_NAME', 'vector-search-testing')
            self.scope_name = os.getenv('SCOPE_NAME', 'shared')
            self.collection_name = self.type  # Use the type parameter as collection name
            self.index_name = os.getenv('INDEX_NAME', 'vector_search_crew')

            # Create primary index if it doesn't exist
            self.cluster.query(
                f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{self.bucket_name}`.`{self.scope_name}`.`{self.collection_name}`"
            ).execute()

            # Initialize vector store
            self.vector_store = CouchbaseVectorStore(
                cluster=self.cluster,
                bucket_name=self.bucket_name,
                scope_name=self.scope_name,
                collection_name=self.collection_name,
                embedding=self.embeddings,
                index_name=self.index_name,
            )
            logger.info("Storage initialized successfully")

        except Exception as e:
            logger.error(f"Initialization failed: {str(e)}")
            raise

    def save(self, value: Any, metadata: Dict[str, Any]) -> None:
        """Save a memory entry with metadata."""
        try:
            # Add text to vector store
            self.vector_store.add_texts(
                texts=[value],
                metadatas=[metadata or {}],
                ids=[f"{self.type}_{metadata.get('id', len(self.search('', limit=1)) + 1)}"]
            )
            logger.info(f"Successfully saved entry with metadata: {metadata}")
        except Exception as e:
            logger.error(f"Save failed: {str(e)}")
            raise

## Testing Vector Search

Let's test the vector search functionality:

In [4]:
# Load environment variables
load_dotenv()

# Initialize storage
storage = CouchbaseStorage("crew_stm_demo")

# Clear existing data
storage.reset()

# Test saving entries
test_entries = [
    ("Vector search uses mathematical vectors to find similar items by converting data into high-dimensional vector space", 
     {"category": "technology", "type": "concept"}),
    ("Couchbase vector search enables semantic similarity matching by storing and comparing vector embeddings", 
     {"category": "database", "type": "implementation"}),
    ("Vector embeddings represent text, images, and other data as numerical vectors for efficient similarity search", 
     {"category": "search", "type": "technique"})
]

for text, metadata in test_entries:
    storage.save(text, metadata)

# Test searching
query = "Tell me about vector search"
results = storage.search(query, limit=2)

print(f"Search results for query: '{query}'")
print("-"*80)
for result in results:
    print("\nResult:")
    print(f"Context: {result['context']}")
    print(f"Metadata: {result['metadata']}")
    print(f"Score: {result['score']}")
    print("-"*80)

2025-01-15 16:38:18 [INFO] Storage initialized successfully
2025-01-15 16:38:26 [INFO] Storage initialized successfully
2025-01-15 16:38:26 [INFO] CouchbaseStorage initialized for type: crew_stm_demo
2025-01-15 16:38:28 [INFO] Successfully reset collection: crew_stm_demo
2025-01-15 16:38:31 [INFO] Successfully saved entry with metadata: {'category': 'technology', 'type': 'concept'}
2025-01-15 16:38:34 [INFO] Successfully saved entry with metadata: {'category': 'database', 'type': 'implementation'}
2025-01-15 16:38:37 [INFO] Successfully saved entry with metadata: {'category': 'search', 'type': 'technique'}


Search results for query: 'Tell me about vector search'
--------------------------------------------------------------------------------

Result:
Context: A Google Street View image of a man loading a large white plastic bag into the boot of his car has helped unravel a murder case in a northern Spanish town, police say. The Google app allows users to see images of streets around the world - filmed by cars mounted with cameras. It captured the exact moment the body of the victim was allegedly being removed. Two people were arrested last month, accused of being responsible for the disappearance and murder of a man in October last year. His dismembered remains were found in a cemetery last week.

This was the first time in 15 years that the Google car had been to the town of Tajueco, in the northern province of Soria. Officials say another photo sequence shows the blurred silhouette of someone transporting a large white bundle in a wheelbarrow. However, police said the images were not "d

## CrewAI Integration

Now let's see how to use CouchbaseStorage with CrewAI agents:

In [5]:
# Initialize language model
llm = ChatOpenAI(
    openai_api_key=os.getenv('OPENAI_API_KEY'),
    model="gpt-4",
    temperature=0.7
)

# Create agents
researcher = Agent(
    role='Research Expert',
    goal='Find relevant information',
    backstory='Expert at finding and analyzing information',
    llm=llm,
    memory=True,
    memory_storage=storage
)

writer = Agent(
    role='Technical Writer',
    goal='Create clear documentation',
    backstory='Expert at technical writing and documentation',
    llm=llm,
    memory=True,
    memory_storage=storage
)

# Create tasks
research_task = Task(
    description='Research vector search capabilities',
    agent=researcher,
    expected_output="Detailed findings about vector search technology and implementations"
)

writing_task = Task(
    description='Document the findings',
    agent=writer,
    expected_output="Clear and comprehensive documentation of the research findings",
    context=[research_task]
)

# Create and run crew
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, writing_task],
    process=Process.sequential,
    verbose=False
)

result = crew.kickoff()
print("\nCrew Result:")
print("-"*80)
print(result)
print("-"*80)


Crew Result:
--------------------------------------------------------------------------------
Vector search is a prominent technique in the field of information retrieval. It operates on the principle of multi-dimensional vectors to identify the most relevant data. Conceptually, each data point is located in a multi-dimensional space, and their "distance" from each other is used to establish their similarity.

Vector search finds extensive use across numerous applications. It's primarily used in recommendation systems to propose products or content that align with users' past interactions. For instance, Netflix employs vector search to suggest shows to its viewers based on their previous viewing patterns. In the domain of natural language processing (NLP), vector search is used to comprehend semantic similarities between different words or phrases. Google utilizes vector search in its search engine to provide the most pertinent search results.

There are a variety of technologies that