# VectorStore-backed memory

## Colab-specific setup

Make sure you have a Database, a Secure Connect Bundle and a Database Token. See [DB Setup](https://cassio.org/db_setup/) on cassio.org for details, **paying attention to the notes for Colab users**.
Get ready to upload the Bundle and supply the Token string.

Likewise, ensure you have the necessary secret for the LLM provider of your choice: you'll be asked to input it shortly. See [API Setup](https://cassio.org/api_setup/) on cassio.org for details, **paying attention to the notes for Colab users**.

_Note: this colab is autogenerated from a [regular Jupyter notebook](https://cassio.org/frameworks/langchain/memory-vectorstore/) hosted on cassio.org. Run all cells in this section to complete the setup before moving on to the demo content proper._

_Note: you can work with your own Cassandra cluster instead of Astra DB, provided it is reachable from the cloud: check the commented code below for the case `cqlMode="local"` and consult [cassio.org](https://cassio.org/frameworks/langchain/setup/#database-choice) for more details._

In [None]:
# install required dependencies
! pip install \
    "git+https://github.com/hemidactylus/langchain@cassio#egg=langchain" \
    "cassandra-driver>=3.28.0" \
    "cassio>=0.0.4" \
    "google-cloud-aiplatform>=1.25.0" \
    "jupyter>=1.0.0" \
    "openai==0.27.7" \
    "python-dotenv==1.0.0" \
    "tensorflow-cpu==2.12.0" \
    "tiktoken==0.4.0" \
    "transformers>=4.29.2"

In [None]:
# Input your database keyspace name:
ASTRA_DB_KEYSPACE = input('Your Astra DB Keyspace name: ')

In [None]:
# Input your Astra DB token string, the one starting with "AstraCS:..."
ASTRA_DB_APPLICATION_TOKEN = input('Your Astra DB Token: ')

### Astra DB Secure Connect Bundle

Please upload the Secure Connect Bundle zipfile to connect to your Astra DB instance.

The Secure Connect Bundle is needed to establish a secure connection to the database.
Click [here](https://awesome-astra.github.io/docs/pages/astra/download-scb/#c-procedure) for instructions on how to download it from Astra DB.

In [None]:
# Upload your Secure Connect Bundle zipfile:
import os
from google.colab import files


print('Please upload your Secure Connect Bundle')
uploaded = files.upload()
if uploaded:
    astraBundleFileTitle = list(uploaded.keys())[0]
    ASTRA_DB_SECURE_BUNDLE_PATH = os.path.join(os.getcwd(), astraBundleFileTitle)
else:
    raise ValueError(
        'Cannot proceed without Secure Connect Bundle. Please re-run the cell.'
    )

In [None]:
# colab-specific override of helper functions
from cassandra.cluster import (
    Cluster,
)
from cassandra.auth import PlainTextAuthProvider

ASTRA_DB_CLIENT_ID = 'token'


def getCQLSession(mode='astra_db'):
    if mode == 'astra_db':
        cluster = Cluster(
            cloud={
                "secure_connect_bundle": ASTRA_DB_SECURE_BUNDLE_PATH,
            },
            auth_provider=PlainTextAuthProvider(
                ASTRA_DB_CLIENT_ID,
                ASTRA_DB_APPLICATION_TOKEN,
            ),
        )
        astraSession = cluster.connect()
        return astraSession
    # elif mode == 'local':
    #     cluster = Cluster(
    #         ['192.168.0.1', '192.168.0.2'],
    #         auth_provider=PlainTextAuthProvider(
    #             "username",
    #             "password!",
    #         ),
    #     )
    #     # See https://docs.datastax.com/en/developer/python-driver/latest/getting_started/#connecting-to-cassandra for more options
    #     localSession = cluster.connect()
    #     return localSession
    else:
        raise ValueError('Unknown CQL Session mode')

def getCQLKeyspace(mode='astra_db'):
    if mode == 'astra_db':
        return ASTRA_DB_KEYSPACE
    # elif mode == 'local':
    #     return <NAME_OF_YOUR_LOCAL_KEYSPACE>
    else:
        raise ValueError('Unknown CQL Session mode')

### LLM Provider

In the cell below you can choose between **GCP VertexAI** or **OpenAI** for your LLM services.
Make sure you set the `llmProvider` variable and supply the corresponding access secrets.

In [None]:
# Set your secret(s) for LLM access (key names must match `providerValidator` in `llm_choice`)
llmProvider = 'OpenAI'  # 'GCP_VertexAI'
if llmProvider == 'OpenAI':
    apiSecret = input(f'Your secret for LLM provider "{llmProvider}": ')
    os.environ['OPENAI_API_KEY'] = apiSecret
elif llmProvider == 'GCP_VertexAI':
    # we need a json file
    print(f'Please upload your Service Account JSON for the LLM provider "{llmProvider}":')
    from google.colab import files
    uploaded = files.upload()
    if uploaded:
        vertexAIJsonFileTitle = list(uploaded.keys())[0]
        os.environ['GOOGLE_APPLICATION_CREDENTIALS'] = os.path.join(os.getcwd(), vertexAIJsonFileTitle)
    else:
        raise ValueError(
            'No file uploaded. Please re-run the cell.'
        )
else:
    raise ValueError('Unknown/unsupported LLM Provider')

### Colab preamble completed

The following cells constitute the demo notebook proper.

# VectorStore-backed memory

The support for Cassandra vector store, available in LangChain, enables another interesting use case, namely a chat memory buffer that injects the most relevant past exchanges into the prompt, instead of the most recent (as most other memories do). This enables retrieval of related context _arbitrarily far back in the chat history_.

All you need is to instantiate a `Cassandra` vector store and wrap it in a `VectorStoreRetrieverMemory` type of memory, provided by LangChain.

In [1]:
from langchain.embeddings.openai import OpenAIEmbeddings
from langchain.llms import OpenAI
from langchain.memory import VectorStoreRetrieverMemory
from langchain.chains import ConversationChain
from langchain.prompts import PromptTemplate

In [2]:
from langchain.vectorstores.cassandra import Cassandra

As usual, a database connection is needed to access Cassandra. The following assumes
that a _vector-search-capable Astra DB instance_ is available. Adjust as needed.

In [3]:
# creation of the DB connection
cqlMode = 'astra_db' # alternatively, 'local' ... if you do have a Cassandra cluster to use, that is
session = getCQLSession(mode=cqlMode)
keyspace = getCQLKeyspace(mode=cqlMode)

Both an LLM and an embedding function are required.

Below is the logic to instantiate the LLM and embeddings of choice. We choose to leave it in the notebooks for clarity.

In [4]:
# creation of the LLM resources


if llmProvider == 'GCP_VertexAI':
    from langchain.llms import VertexAI
    from langchain.embeddings import VertexAIEmbeddings
    llm = VertexAI()
    myEmbedding = VertexAIEmbeddings()
    print('LLM+embeddings from VertexAI')
elif llmProvider == 'OpenAI':
    from langchain.llms import OpenAI
    from langchain.embeddings import OpenAIEmbeddings
    llm = OpenAI(temperature=0)
    myEmbedding = OpenAIEmbeddings()
    print('LLM+embeddings from OpenAI')
else:
    raise ValueError('Unknown LLM provider.')

LLM+embeddings from OpenAI


## Create the store

In [5]:
table_name = 'vstore_memory_' + llmProvider
cassVStore = Cassandra(
    session=session,
    keyspace=keyspace,
    table_name=table_name,
    embedding=myEmbedding,
)

# just in case this demo runs multiple times
cassVStore.clear()

## Create the retriever and the memory

From the vector store a "retriever" is spawned. You'll keep the number of items to fetch intentionally very small for demonstration purposes.

Next, the retriever is wrapped in a `VectorStoreRetrieverMemory`:

In [6]:
retriever = cassVStore.as_retriever(search_kwargs={'k': 3})
semanticMemory = VectorStoreRetrieverMemory(retriever=retriever)

Create a fake "past conversation". Note how the topic of the discussion wanders to fixing one's PC in the last few exchanges:

In [7]:
pastExchanges = [
    (
        {"input": "Hello, what is the biggest mammal?"},
        {"output": "The blue whale."},
    ),
    (
        {"input": "... I cannot swim. Actually I hate swimming!"},
        {"output": "I see."},
    ),
    (
        {"input": "I like mountains and beech forests."},
        {"output": "That's good to know."},
    ),
    (
        {"input": "Yes, too much water makes me uneasy."},
        {"output": "Ah, how come?."},
    ),
    (
        {"input": "I guess I am just not a seaside person"},
        {"output": "I see. How may I help you?"},
    ),
    (
        {"input": "I need help installing this driver"},
        {"output": "First download the right version for your operating system."},
    ),
    (
        {"input": "Good grief ... my keyboard does not work anymore!"},
        {"output": "Try plugging it in your PC first."},
    ),
]

Insert these exchanges into the memory:

In [8]:
for exI, exO in pastExchanges:
    semanticMemory.save_context(exI, exO)

Given a conversation input, the `load_memory_variables` performs a semantic search and comes up with relevant items from the memory, regardless of their order:

In [9]:
QUESTION = "Can you suggest me a sport to try?"

In [10]:
print(semanticMemory.load_memory_variables({"prompt": QUESTION})["history"])

input: ... I cannot swim. Actually I hate swimming!
output: I see.
input: I guess I am just not a seaside person
output: I see. How may I help you?
input: I like mountains and beech forests.
output: That's good to know.


## Usage in a conversation chain

This semantic memory element can be used within a full conversation chain.

In the following you'll create a custom prompt and a `ConversationChain` out of it, attaching the latter to the vector-store-powered memory seen above:

In [11]:
semanticMemoryTemplateString = """The following is a between a human and a helpful AI.
The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know.

The AI can use information from parts of the previous conversation (only if they are relevant):
{history}

Current conversation:
Human: {input}
AI:"""

memoryPrompt = PromptTemplate(
    input_variables=["history", "input"],
    template=semanticMemoryTemplateString
)

conversationWithVectorRetrieval = ConversationChain(
    llm=llm, 
    prompt=memoryPrompt,
    memory=semanticMemory,
    verbose=True
)

Run the chain with the sports question:

In [12]:
conversationWithVectorRetrieval.predict(input=QUESTION)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a between a human and a helpful AI.
The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know.

The AI can use information from parts of the previous conversation (only if they are relevant):
input: ... I cannot swim. Actually I hate swimming!
output: I see.
input: I guess I am just not a seaside person
output: I see. How may I help you?
input: I like mountains and beech forests.
output: That's good to know.

Current conversation:
Human: Can you suggest me a sport to try?
AI:[0m

[1m> Finished chain.[0m


' Sure, I can suggest some sports for you to try. Depending on your preferences, you could try hiking, running, biking, or even rock climbing. Do any of these sound interesting to you?'

Notice how new exchanges are automatically added to the memory:

In [13]:
conversationWithVectorRetrieval.predict(input="Would I like a swim in a mountain lake?")



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a between a human and a helpful AI.
The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know.

The AI can use information from parts of the previous conversation (only if they are relevant):
input: I like mountains and beech forests.
output: That's good to know.
input: Can you suggest me a sport to try?
response:  Sure, I can suggest some sports for you to try. Depending on your preferences, you could try hiking, running, biking, or even rock climbing. Do any of these sound interesting to you?
input: ... I cannot swim. Actually I hate swimming!
output: I see.

Current conversation:
Human: Would I like a swim in a mountain lake?
AI:[0m

[1m> Finished chain.[0m


" That depends on your preferences. If you don't like swimming, then a swim in a mountain lake may not be the best activity for you. However, if you enjoy the outdoors and the scenery of a mountain lake, then it could be a great experience."

... so that now the most relevant items for the same question are changed:

In [14]:
semanticMemory.retriever.get_relevant_documents(QUESTION)

[Document(page_content='input: Can you suggest me a sport to try?\nresponse:  Sure, I can suggest some sports for you to try. Depending on your preferences, you could try hiking, running, biking, or even rock climbing. Do any of these sound interesting to you?', metadata={}),
 Document(page_content="input: Would I like a swim in a mountain lake?\nresponse:  That depends on your preferences. If you don't like swimming, then a swim in a mountain lake may not be the best activity for you. However, if you enjoy the outdoors and the scenery of a mountain lake, then it could be a great experience.", metadata={}),
 Document(page_content='input: ... I cannot swim. Actually I hate swimming!\noutput: I see.', metadata={})]

## A counterexample

What would happen with a simpler memory element, which simply retrieves a certain number of most recent interactions?

Create and populate an instance of LangChain's `ConversationTokenBufferMemory`, limiting it to a maximum token length of 80 (roughly equivalent to the 3 fragments set for the `semanticMemory` object):

In [15]:
from langchain.memory import ConversationTokenBufferMemory
from langchain.memory import ChatMessageHistory

In [16]:
baseHistory = ChatMessageHistory()

recencyBufferMemory = ConversationTokenBufferMemory(
    chat_memory=baseHistory,
    max_token_limit=80,
    llm=llm,
)

In [None]:
for exI, exO in pastExchanges:
    recencyBufferMemory.save_context(exI, exO)

Time to ask the same sports question. This is what will get injected into the prompt this time:

In [17]:
print(recencyBufferMemory.load_memory_variables({"prompt": QUESTION})["history"])




... and this is the (rather generic) answer you'd get:

In [18]:
conversationWithRecencyRetrieval = ConversationChain(
    llm=llm, 
    prompt=memoryPrompt,
    memory=recencyBufferMemory,
    verbose=True
)

conversationWithRecencyRetrieval.predict(input=QUESTION)



[1m> Entering new ConversationChain chain...[0m
Prompt after formatting:
[32;1m[1;3mThe following is a between a human and a helpful AI.
The AI is talkative and provides lots of specific details from its context.
If the AI does not know the answer to a question, it truthfully says it does not know.

The AI can use information from parts of the previous conversation (only if they are relevant):


Current conversation:
Human: Can you suggest me a sport to try?
AI:[0m

[1m> Finished chain.[0m


" Sure! Have you ever tried tennis? It's a great sport that can be played both indoors and outdoors. It's a great way to get some exercise and have fun at the same time. Plus, it's a great way to meet new people and make new friends."