In [3]:
# Install python-dotenv
!pip install python-dotenv couchbase datasets langchain_core langchain_cohere langchain_couchbase tqdm

Collecting python-dotenv
  Using cached python_dotenv-1.0.1-py3-none-any.whl.metadata (23 kB)
Collecting couchbase
  Using cached couchbase-4.3.0-cp310-cp310-manylinux2014_x86_64.manylinux_2_17_x86_64.whl.metadata (23 kB)
Collecting datasets
  Using cached datasets-2.21.0-py3-none-any.whl.metadata (21 kB)
Collecting langchain_core
  Using cached langchain_core-0.2.34-py3-none-any.whl.metadata (6.2 kB)
Collecting langchain_cohere
  Using cached langchain_cohere-0.2.2-py3-none-any.whl.metadata (6.6 kB)
Collecting langchain_couchbase
  Using cached langchain_couchbase-0.1.1-py3-none-any.whl.metadata (1.9 kB)
Collecting dill<0.3.9,>=0.3.0 (from datasets)
  Downloading dill-0.3.8-py3-none-any.whl.metadata (10 kB)
Collecting xxhash (from datasets)
  Downloading xxhash-3.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (12 kB)
Collecting multiprocess (from datasets)
  Downloading multiprocess-0.70.16-py310-none-any.whl.metadata (7.2 kB)
Collecting aiohttp (from datasets

# Importing Necessary Libraries
The script starts by importing a series of libraries required for various tasks, including handling JSON, logging, time tracking, Couchbase connections, embedding generation, and dataset loading. These libraries provide essential functions for working with data, managing database connections, and processing machine learning models.

In [7]:
import json
import logging
import os
import time
import warnings
import getpass
from datetime import timedelta
from uuid import uuid4

import numpy as np
from couchbase.auth import PasswordAuthenticator
from couchbase.cluster import Cluster
from couchbase.exceptions import (CouchbaseException,
                                  InternalServerFailureException,
                                  QueryIndexAlreadyExistsException)
from couchbase.management.search import SearchIndex
from couchbase.options import ClusterOptions
from datasets import load_dataset
from dotenv import load_dotenv
from langchain_cohere import ChatCohere, CohereEmbeddings
from langchain_core.documents import Document
from langchain_core.globals import set_llm_cache
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_couchbase.cache import CouchbaseCache
from langchain_couchbase.vectorstores import CouchbaseVectorStore
from tqdm import tqdm

# Setup Logging
Logging is configured to track the progress of the script and capture any errors or warnings. This is crucial for debugging and understanding the flow of execution. The logging output includes timestamps, log levels (e.g., INFO, ERROR), and messages that describe what is happening in the script.


In [8]:
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s',force=True)

# Loading Environment Variables
These variables typically include sensitive information like API keys, database usernames, and passwords. Using environment variables helps keep the code clean and secure by not hardcoding sensitive information directly into the script.

In [9]:
COHERE_API_KEY = getpass.getpass('Enter your Cohere API key: ')
CB_HOST = input('Enter your Couchbase host (default: couchbase://localhost): ') or 'couchbase://localhost'
CB_USERNAME = input('Enter your Couchbase username (default: Administrator): ') or 'Administrator'
CB_PASSWORD = getpass.getpass('Enter your Couchbase password (default: password): ') or 'password'
CB_BUCKET_NAME = input('Enter your Couchbase bucket name (default: vector-search-testing): ') or 'vector-search-testing'
INDEX_NAME = input('Enter your index name (default: vector_search_cohere): ') or 'vector_search_cohere'
SCOPE_NAME = input('Enter your scope name (default: shared): ') or 'shared'
COLLECTION_NAME = input('Enter your collection name (default: cohere): ') or 'cohere'
CACHE_COLLECTION = input('Enter your cache collection name (default: cache): ') or 'cache'

# Check if the variables are correctly loaded
if not COHERE_API_KEY:
    raise ValueError("COHERE_API_KEY is not provided and is required.")

Enter your Cohere API key: ··········
Enter your Couchbase host (default: couchbase://localhost): couchbases://cb.hlcup4o4jmjr55yf.cloud.couchbase.com
Enter your Couchbase username (default: Administrator): vector-search-rag-demos
Enter your Couchbase password (default: password): ··········
Enter your Couchbase bucket name (default: vector-search-testing): 
Enter your index name (default: vector_search_cohere): 
Enter your scope name (default: shared): 
Enter your collection name (default: cohere): 
Enter your cache collection name (default: cache): 


# Connect to Couchbase
The script attempts to establish a connection to the Couchbase database using the credentials retrieved from the environment variables. Couchbase is a NoSQL database known for its flexibility, scalability, and support for various data models, including document-based storage. The connection is authenticated using a username and password, and the script waits until the connection is fully established before proceeding.




In [10]:
try:
    auth = PasswordAuthenticator(CB_USERNAME, CB_PASSWORD)
    options = ClusterOptions(auth)
    cluster = Cluster(CB_HOST, options)
    cluster.wait_until_ready(timedelta(seconds=5))
    logging.info("Successfully connected to Couchbase")
except Exception as e:
    raise ConnectionError(f"Failed to connect to Couchbase: {str(e)}")

2024-08-23 11:43:35,766 - INFO - Successfully connected to Couchbase


# Setting Up Collections in Couchbase
In Couchbase, data is organized in buckets, which can be further divided into scopes and collections. Think of a collection as a table in a traditional SQL database. Before we can store any data, we need to ensure that our collections exist. If they don't, we must create them. This step is important because it prepares the database to handle the specific types of data our application will process. By setting up collections, we define the structure of our data storage, which is essential for efficient data retrieval and management.

Moreover, setting up collections allows us to isolate different types of data within the same bucket, providing a more organized and scalable data structure. This is particularly useful when dealing with large datasets, as it ensures that related data is stored together, making it easier to manage and query.


In [11]:
def setup_collection(cluster, bucket_name, scope_name, collection_name):
    try:
        bucket = cluster.bucket(bucket_name)
        bucket_manager = bucket.collections()

        # Check if collection exists, create if it doesn't
        collections = bucket_manager.get_all_scopes()
        collection_exists = any(
            scope.name == scope_name and collection_name in [col.name for col in scope.collections]
            for scope in collections
        )

        if not collection_exists:
            logging.info(f"Collection '{collection_name}' does not exist. Creating it...")
            bucket_manager.create_collection(scope_name, collection_name)
            logging.info(f"Collection '{collection_name}' created successfully.")
        else:
            logging.info(f"Collection '{collection_name}' already exists.Skipping creation.")

        collection = bucket.scope(scope_name).collection(collection_name)

        # Ensure primary index exists
        try:
            cluster.query(f"CREATE PRIMARY INDEX IF NOT EXISTS ON `{bucket_name}`.`{scope_name}`.`{collection_name}`").execute()
            logging.info("Primary index present or created successfully.")
        except Exception as e:
            logging.warning(f"Error creating primary index: {str(e)}")

        # Clear all documents in the collection
        try:
            query = f"DELETE FROM `{bucket_name}`.`{scope_name}`.`{collection_name}`"
            cluster.query(query).execute()
            logging.info("All documents cleared from the collection.")
        except Exception as e:
            logging.warning(f"Error while clearing documents: {str(e)}. The collection might be empty.")

        return collection
    except Exception as e:
        raise RuntimeError(f"Error setting up collection: {str(e)}")

setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, COLLECTION_NAME)
setup_collection(cluster, CB_BUCKET_NAME, SCOPE_NAME, CACHE_COLLECTION)

2024-08-23 11:43:40,297 - INFO - Collection 'cohere' does not exist. Creating it...
2024-08-23 11:43:40,506 - INFO - Collection 'cohere' created successfully.
2024-08-23 11:43:43,803 - INFO - Primary index present or created successfully.
2024-08-23 11:43:43,852 - INFO - All documents cleared from the collection.
2024-08-23 11:43:44,024 - INFO - Collection 'cache' already exists.Skipping creation.
2024-08-23 11:43:44,055 - INFO - Primary index present or created successfully.
2024-08-23 11:43:44,086 - INFO - All documents cleared from the collection.


<couchbase.collection.Collection at 0x79efd1ffac50>

# Load Index Definition
The search index definition is loaded from a JSON file. This index defines how the data in Couchbase should be indexed for fast search and retrieval. Indexing is critical for optimizing search queries, especially when dealing with large datasets. The JSON file contains details about the index, such as its name, source type, and parameters.

In [12]:
# index_definition_path = '/path_to_your_index_file/cohere_index.json'

# Prompt user to upload to google drive
from google.colab import files
print("Upload your index definition file")
uploaded = files.upload()
index_definition_path = list(uploaded.keys())[0]

try:
    with open(index_definition_path, 'r') as file:
        index_definition = json.load(file)
except Exception as e:
    raise ValueError(f"Error loading index definition from {index_definition_path}: {str(e)}")

Upload your index definition file


Saving cohere_index.json to cohere_index.json


# Create or Update Search Index
The script checks if the search index already exists in Couchbase. If it exists, the index is updated; if not, a new index is created. This step ensures that the data is properly indexed, allowing for efficient search operations later in the script. The index is associated with a specific bucket, scope, and collection in Couchbase, which organizes the data.


In [13]:
try:
    scope_index_manager = cluster.bucket(CB_BUCKET_NAME).scope(SCOPE_NAME).search_indexes()

    # Check if index already exists
    existing_indexes = scope_index_manager.get_all_indexes()
    index_name = index_definition["name"]

    if index_name in [index.name for index in existing_indexes]:
        logging.info(f"Index '{index_name}' found")
    else:
        logging.info(f"Creating new index '{index_name}'...")

    # Create SearchIndex object from JSON definition
    search_index = SearchIndex.from_json(index_definition)

    # Upsert the index (create if not exists, update if exists)
    scope_index_manager.upsert_index(search_index)
    logging.info(f"Index '{index_name}' successfully created/updated.")

except QueryIndexAlreadyExistsException:
    logging.info(f"Index '{index_name}' already exists. Skipping creation/update.")

except InternalServerFailureException as e:
    error_message = str(e)
    logging.error(f"InternalServerFailureException raised: {error_message}")

    try:
        # Accessing the response_body attribute from the context
        error_context = e.context
        response_body = error_context.response_body
        if response_body:
            error_details = json.loads(response_body)
            error_message = error_details.get('error', '')

            if "collection: 'cohere' doesn't belong to scope: 'shared'" in error_message:
                raise ValueError("Collection 'cohere' does not belong to scope 'shared'. Please check the collection and scope names.")

    except ValueError as ve:
        logging.error(str(ve))
        raise

    except Exception as json_error:
        logging.error(f"Failed to parse the error message: {json_error}")
        raise RuntimeError(f"Internal server error while creating/updating search index: {error_message}")

2024-08-23 11:44:24,232 - INFO - Creating new index 'vector_search_cohere'...
2024-08-23 11:44:24,437 - INFO - Index 'vector_search_cohere' successfully created/updated.


# Load TREC Dataset
The TREC dataset is loaded using the datasets library. TREC is a well-known dataset used in information retrieval and natural language processing (NLP) tasks. In this script, the dataset will be used to generate embeddings, which are numerical representations of text that capture its meaning in a form suitable for machine learning models.


In [14]:
try:
    trec = load_dataset('trec', split='train[:1000]')
    logging.info(f"Successfully loaded TREC dataset with {len(trec)} samples")
except Exception as e:
    raise ValueError(f"Error loading TREC dataset: {str(e)}")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


Downloading builder script:   0%|          | 0.00/5.09k [00:00<?, ?B/s]

Downloading readme:   0%|          | 0.00/10.6k [00:00<?, ?B/s]

The repository for trec contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/trec.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] y


Downloading data:   0%|          | 0.00/336k [00:00<?, ?B/s]

Downloading data:   0%|          | 0.00/23.4k [00:00<?, ?B/s]

Generating train split:   0%|          | 0/5452 [00:00<?, ? examples/s]

Generating test split:   0%|          | 0/500 [00:00<?, ? examples/s]

2024-08-23 11:44:43,946 - INFO - Successfully loaded TREC dataset with 1000 samples


# Create Embeddings
Embeddings are created using the Cohere API. Embeddings are vectors (arrays of numbers) that represent the meaning of text in a high-dimensional space. These embeddings are crucial for tasks like semantic search, where the goal is to find text that is semantically similar to a query. The script uses a pre-trained model provided by Cohere to generate embeddings for the text in the TREC dataset.

In [15]:
try:
    embeddings = CohereEmbeddings(
        cohere_api_key=COHERE_API_KEY,
        model="embed-english-v3.0",
    )
    logging.info("Successfully created CohereEmbeddings")
except Exception as e:
    raise ValueError(f"Error creating CohereEmbeddings: {str(e)}")

2024-08-23 12:18:34,553 - INFO - Successfully created CohereEmbeddings


# Set Up Vector Store
The vector store is set up to manage the embeddings created in the previous step. The vector store is essentially a database optimized for storing and retrieving high-dimensional vectors. In this case, the vector store is built on top of Couchbase, allowing the script to store the embeddings in a way that can be efficiently searched.


In [16]:
try:
    vector_store = CouchbaseVectorStore(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=COLLECTION_NAME,
        embedding=embeddings,
        index_name=INDEX_NAME,
    )
    logging.info("Successfully created vector store")
except Exception as e:
    raise ValueError(f"Failed to create vector store: {str(e)}")

2024-08-23 12:18:37,927 - INFO - Successfully created vector store


# Save Data to Vector Store in Batches
To avoid overloading memory, the TREC dataset's text fields are saved to the vector store in batches. This step is important for handling large datasets, as it breaks down the data into manageable chunks that can be processed sequentially. Each piece of text is converted into a document, assigned a unique identifier, and then stored in the vector store.


In [17]:
try:
    batch_size = 50
    for i in tqdm(range(0, len(trec['text']), batch_size), desc="Processing Batches"):
        batch = trec['text'][i:i + batch_size]
        documents = [Document(page_content=text) for text in batch]
        uuids = [str(uuid4()) for _ in range(len(documents))]
        vector_store.add_documents(documents=documents, ids=uuids)
except Exception as e:
    raise RuntimeError(f"Failed to save documents to vector store: {str(e)}")

Processing Batches:   0%|          | 0/20 [00:00<?, ?it/s]2024-08-23 12:18:42,103 - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
Processing Batches:   5%|▌         | 1/20 [00:01<00:23,  1.24s/it]2024-08-23 12:18:43,317 - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
Processing Batches:  10%|█         | 2/20 [00:02<00:21,  1.20s/it]2024-08-23 12:18:44,443 - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
Processing Batches:  15%|█▌        | 3/20 [00:03<00:19,  1.16s/it]2024-08-23 12:18:45,528 - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
Processing Batches:  20%|██        | 4/20 [00:04<00:17,  1.12s/it]2024-08-23 12:18:47,333 - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
Processing Batches:  25%|██▌       | 5/20 [00:06<00:20,  1.37s/it]2024-08-23 12:18:48,618 - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"


# Set Up Cache
 A cache is set up using Couchbase to store intermediate results and frequently accessed data. Caching is important for improving performance, as it reduces the need to repeatedly calculate or retrieve the same data. The cache is linked to a specific collection in Couchbase, and it is used later in the script to store the results of language model queries.


In [18]:
try:
    cache = CouchbaseCache(
        cluster=cluster,
        bucket_name=CB_BUCKET_NAME,
        scope_name=SCOPE_NAME,
        collection_name=CACHE_COLLECTION,
    )
    logging.info("Successfully created cache")
    set_llm_cache(cache)
except Exception as e:
    raise ValueError(f"Failed to create cache: {str(e)}")

2024-08-23 12:19:11,578 - INFO - Successfully created cache


# Create Language Model (LLM)
The script initializes a Cohere language model (LLM) that will be used for generating responses to queries. LLMs are powerful tools for natural language understanding and generation, capable of producing human-like text based on input prompts. The model is configured with specific parameters, such as the temperature, which controls the randomness of its outputs.


In [19]:
try:
    llm = ChatCohere(
        cohere_api_key=COHERE_API_KEY,
        model="command",
        temperature=0
    )
    logging.info(f"Successfully created Cohere LLM with model command")
except Exception as e:
    raise ValueError(f"Error creating Cohere LLM: {str(e)}")

2024-08-23 12:19:15,665 - INFO - Successfully created Cohere LLM with model command


# Retrieval-Augmented Generation (RAG)
RAG is a technique used in natural language processing that combines two key components: a retrieval system and a generative language model. The goal is to enhance the quality and relevance of generated responses by leveraging external knowledge stored in a database or other information sources.

## Components of a RAG Chain
### Retrieval System (Vector Store):

The first part of the RAG process involves a retrieval system, typically based on dense vector representations (embeddings) of documents or texts.
This system takes a query as input and searches for the most relevant documents within a large collection of texts stored in a vector database (vector store). The relevance is measured by the similarity between the query's embedding and the embeddings of the stored documents.
The output of the retrieval system is a set of documents that are most relevant to the query. These documents provide context that can be used to generate a more informed and accurate response.

### Generative Language Model (LLM):

The second part of the RAG chain is a generative language model, such as GPT or Claude. This model takes the retrieved documents (context) along with the original query and generates a response.
The generative model is enhanced by the additional context provided by the retrieval system, enabling it to generate more accurate, context-aware responses.

## How the RAG Chain Works
### Input Query:

The process starts with an input query from the user, such as a question or a request for information.

### Context Retrieval:

The vector store, which contains embeddings of a large set of documents, is queried with the input. The vector store returns the most relevant documents (based on their embeddings) that match the query.

### Formatting Context:

The retrieved documents are then formatted into a coherent context that the generative model can use. This step usually involves concatenating the texts of the documents into a single string or structured format.

### Contextual Response Generation:

The generative language model receives the formatted context along with the original query. Using this additional information, the model generates a response that is more informed and accurate than what it could produce using the query alone.

### Final Output:

The generated response is returned to the user. This response is augmented by the context retrieved from the vector store, making it more relevant and grounded in factual information.

## Example Flow in the RAG Chain
Consider a query: "What caused the 1929 Great Depression?"

### Retrieval:
The vector store searches its collection of documents for texts that mention or discuss the 1929 Great Depression. It retrieves a few key documents that provide relevant context.

### Generation:
The generative model then takes the retrieved documents and the original query to generate a detailed and informed response, such as "The 1929 Great Depression was primarily caused by the stock market crash in October 1929, along with subsequent failures in banking and reduction in consumer spending."

### Output:
The final response, enriched by the retrieved context, is returned to the user.

## Why Use a RAG Chain?
- Improved Accuracy: By augmenting the generation process with relevant, up-to-date information, the model can produce more accurate and context-aware responses.
- Handling Complex Queries: RAG is particularly effective for complex queries that require specific information, as it can pull in relevant context that the model itself might not "know."
- Reduced Hallucination: Generative models sometimes "hallucinate" information. By grounding responses in retrieved documents, RAG helps reduce the likelihood of generating incorrect or misleading information.

## In the Provided Code
In the provided code, the RAG chain is created with the following steps:

1. System Message Template: Sets up the initial context for the language model, explaining its role as a helpful assistant that answers based on the provided context.
2. Human Message Template: Specifies how the context and question will be passed to the model.
3. Chat Prompt: Combines the system and human templates into a unified prompt.
4. Chain Execution: The chain executes by retrieving relevant documents from the vector store and then using the generative model to create a response based on both the retrieved context and the original query.

The output is a response that leverages both the stored knowledge (retrieved documents) and the generative capabilities of the language model.



In [20]:
template = """You are a helpful bot. If you cannot answer based on the context provided, respond with a generic answer. Answer the question as truthfully as possible using the context below:
{context}

Question: {question}"""
prompt = ChatPromptTemplate.from_template(template)

rag_chain = (
    {"context": vector_store.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | llm
    | StrOutputParser()
)
logging.info("Successfully created RAG chain")

2024-08-23 12:19:18,507 - INFO - Successfully created RAG chain


In [21]:
query = "What caused the 1929 Great Depression?"

# Get RAG response
start_time = time.time()
rag_response = rag_chain.invoke(query)
rag_elapsed_time = time.time() - start_time
logging.info(f"RAG response generated in {rag_elapsed_time:.2f} seconds")
print(f"RAG Response: {rag_response}")

2024-08-23 12:19:22,725 - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
2024-08-23 12:19:26,918 - INFO - HTTP Request: POST https://api.cohere.com/v1/chat "HTTP/1.1 200 OK"
2024-08-23 12:19:26,953 - INFO - RAG response generated in 4.42 seconds


RAG Response: The 1929 Great Depression was caused by an interplay of events, including monetary policy mistakes, a devastating crop failure, and a lack of coordination between governments and banks. Specifically, in 1928, the Federal Reserve attempted to rein in the boom that occurred between 1924 and 1927 by raising interest rates. This action slowed down investment and caused a stock market crash in October 1929, which led to a banking crisis as investors lost confidence in the solvency of banks and demanded their money in cash, forcing banks to liquidate their assets and contract lending. The resulting lack of liquidity in the economy further exacerbated the crisis, causing a downward economic spiral.


# Perform Semantic Search
Semantic search is a process of finding documents that are semantically (or meaningfully) similar to a given query, rather than relying solely on keyword matching. The goal is to understand the context and meaning behind the words in the query and compare it with the stored documents.
## Embedding and Vector Representation

Text Embedding:
Both the query and the documents are converted into fixed-length vectors of real numbers, known as embeddings. These embeddings capture the semantic meaning of the text. The process involves using pre-trained models, such as OpenAI's text-embedding-ada-002, to map text into a high-dimensional space where semantically similar texts have closer embeddings.

## Similarity Measurement

### Cosine Similarity:
Once the query and document embeddings are generated, the similarity between the query vector and each document vector is typically measured using cosine similarity.

Cosine similarity between two vectors 𝐴 A and 𝐵 B is given by:

![Cosine Similarity](https://github.com/masongallo/masongallo.github.io/raw/master/assets/cos_sim.png)

where A⋅B is the dot product of the two vectors, and ∥A∥ and ∥B∥ are the magnitudes of vectors A and B, respectively.

### Distance (1 - Cosine Similarity):
The distance is often defined as 1 - Cosine Similarity, where a smaller distance indicates higher similarity.

## Similarity Search

The search involves calculating the similarity (or distance) between the query vector and each document vector in the vector store. The documents with the smallest distance (or highest similarity) are considered the most relevant to the query.


In [22]:
try:
    start_time = time.time()
    search_results = vector_store.similarity_search_with_score(query, k=10)
    elapsed_time = time.time() - start_time
    results = [{'id': doc.metadata.get('id', 'N/A'), 'text': doc.page_content, 'distance': score}
               for doc, score in search_results]
    logging.info(f"Semantic search completed in {elapsed_time:.2f} seconds")
    print(f"\nSemantic Search Results (completed in {elapsed_time:.2f} seconds):")
    for result in results:
        print(f"Distance: {result['distance']:.4f}, Text: {result['text']}")
except CouchbaseException as e:
    raise RuntimeError(f"Error performing semantic search: {str(e)}")


2024-08-23 12:19:32,297 - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"
2024-08-23 12:19:32,457 - INFO - Semantic search completed in 0.40 seconds



Semantic Search Results (completed in 0.40 seconds):
Distance: 0.6202, Text: Why did the world enter a global depression in 1929 ?
Distance: 0.4897, Text: When was `` the Great Depression '' ?
Distance: 0.3765, Text: What crop failure caused the Irish Famine ?
Distance: 0.3185, Text: What caused Harry Houdini 's death ?
Distance: 0.3056, Text: What causes pneumonia ?
Distance: 0.3031, Text: When did World War I start ?
Distance: 0.2901, Text: What war did the Wanna-Go-Home Riots occur after ?
Distance: 0.2855, Text: What caused the Lynmouth floods ?
Distance: 0.2785, Text: What historical event happened in Dogtown in 1899 ?
Distance: 0.2638, Text: What sports event is Meyer Wolfsheim supposed to have fixed in The Great Gatsby ?


# Using Couchbase as a caching mechanism
Couchbase can be used as a cache for RAG (Retrieval-Augmented Generation) responses by storing and retrieving precomputed results for specific queries. This caching mechanism improves the efficiency and speed of the system, especially when dealing with repeated or similar queries. Here's how Couchbase acts as a cache in this context:
1. **Storing Responses in Couchbase:**
    - Key-Value Storage: Couchbase is a NoSQL database that operates as a distributed key-value store. Each response generated by the RAG chain can be stored in Couchbase, where the query serves as the key, and the corresponding response serves as the value.
    - Caching Strategy: When a query is processed by the RAG chain for the first time, the system checks if a cached response exists in Couchbase. If not, the RAG chain retrieves relevant documents, generates a response using the language model, and then stores this response in Couchbase. The key for this entry is the query itself.

2. **Retrieving Cached Responses:**
    - Cache Lookup: On subsequent requests with the same query, the system first checks Couchbase to see if a cached response exists. If a cached response is found, it is retrieved directly from Couchbase without needing to run the RAG chain again.
    - Reduced Latency: By retrieving the response from Couchbase, the system avoids the time-consuming steps of document retrieval and response generation, significantly reducing the response time. This is particularly useful for frequently repeated queries.

3. **Integration with the RAG Chain:**
    - Cache Middleware: In the RAG setup, Couchbase can be integrated as a middleware layer that intercepts queries before they reach the RAG chain. This middleware checks Couchbase for a cached response first.
    - Conditional Processing: If a response is found in Couchbase, it is returned immediately. If not, the query proceeds through the standard RAG process, after which the generated response is cached for future use.

4. **Example Flow with Couchbase Cache:**
    - Initial Query:
      - Query: "What caused the 1929 Great Depression?"
      - Cache Miss: Couchbase is checked for this query, but no cached response is found.
      - Processing: The RAG chain processes the query, retrieves relevant documents, generates a response using the language model, and returns this response to the user.
      - Caching: The response is then stored in Couchbase with the query as the key.
    - Subsequent Query:
      - Query: "What caused the 1929 Great Depression?" (repeated)
      - Cache Hit: Couchbase is checked, and the cached response is found.
      - Direct Retrieval: The response is directly retrieved from Couchbase and returned to the user without reprocessing the RAG chain, resulting in a much faster response time.

In [23]:
queries = [
        "How does photosynthesis work?",
        "What is the capital of France?",
        "What caused the 1929 Great Depression?",  # Repeated query
        "How does photosynthesis work?",  # Repeated query
    ]

for i, query in enumerate(queries, 1):
    print(f"\nQuery {i}: {query}")
    start_time = time.time()
    response = rag_chain.invoke(query)
    elapsed_time = time.time() - start_time
    print(f"Response: {response}")
    print(f"Time taken: {elapsed_time:.2f} seconds")

2024-08-23 12:19:37,476 - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"



Query 1: How does photosynthesis work?


2024-08-23 12:19:47,292 - INFO - HTTP Request: POST https://api.cohere.com/v1/chat "HTTP/1.1 200 OK"
2024-08-23 12:19:47,412 - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"


Response: Photosynthesis is a process by which plants convert sunlight into chemical energy, specifically glucose. The process occurs in special structures called chloroplasts within the plant's cells. Here are the steps behind photosynthesis:

1. Absorption of sunlight: Chlorophyll, a pigment found in chloroplasts, captures the energy of sunlight.

2. Light reaction: This occurs when the absorbed sunlight energy is used to split water molecules into hydrogen ions and electrons. The hydrogen ions move through the membrane to the interior of the chloroplast while the electrons initiate a series of electron transfers that eventually lead to the synthesis of ATP molecules.

3. Carbon fixation: The ATP and hydrogen ions combine in a process called chemiosmosis, producing NADPH molecules and releasing energy. Carbon dioxide from the atmosphere is then combined with this energy, along with the electrons returned from the electron transport chain. This results in the production of glucose mol

2024-08-23 12:19:47,990 - INFO - HTTP Request: POST https://api.cohere.com/v1/chat "HTTP/1.1 200 OK"
2024-08-23 12:19:48,183 - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"


Response: Paris is the capital of France.
Time taken: 0.70 seconds

Query 3: What caused the 1929 Great Depression?
Response: The 1929 Great Depression was caused by an interplay of events, including monetary policy mistakes, a devastating crop failure, and a lack of coordination between governments and banks. Specifically, in 1928, the Federal Reserve attempted to rein in the boom that occurred between 1924 and 1927 by raising interest rates. This action slowed down investment and caused a stock market crash in October 1929, which led to a banking crisis as investors lost confidence in the solvency of banks and demanded their money in cash, forcing banks to liquidate their assets and contract lending. The resulting lack of liquidity in the economy further exacerbated the crisis, causing a downward economic spiral.
Time taken: 0.48 seconds

Query 4: How does photosynthesis work?


2024-08-23 12:19:49,447 - INFO - HTTP Request: POST https://api.cohere.com/v1/embed "HTTP/1.1 200 OK"


Response: Photosynthesis is a process by which plants convert sunlight into chemical energy, specifically glucose. The process occurs in special structures called chloroplasts within the plant's cells. Here are the steps behind photosynthesis:

1. Absorption of sunlight: Chlorophyll, a pigment found in chloroplasts, captures the energy of sunlight.

2. Light reaction: This occurs when the absorbed sunlight energy is used to split water molecules into hydrogen ions and electrons. The hydrogen ions move through the membrane to the interior of the chloroplast while the electrons initiate a series of electron transfers that eventually lead to the synthesis of ATP molecules.

3. Carbon fixation: The ATP and hydrogen ions combine in a process called chemiosmosis, producing NADPH molecules and releasing energy. Carbon dioxide from the atmosphere is then combined with this energy, along with the electrons returned from the electron transport chain. This results in the production of glucose mol