# Keyspaces and Langchain chat history
--- 

LangChain is a popular framework for developing applications powered by large language models (LLMs). LangChain provides tools and abstractions to improve the customization, accuracy, and relevancy of the information the models generate. For example, developers may want to build chat applications and preserve chat history or break down complex prompts into several steps or allow LLMs to access transactional data in realtime without without retraining. 

Amazon Keyspaces (for Apache Cassandra) is a scalable, highly available, and serverless database service on AWS. With Amazon Keyspaces you can build applications that can scale to thousands of requests per second with consistent latencies. With Amazon Keyspaces, you can use Apache Cassandra application code and developer tools that you use today.  Amazon Keyspaces is serverless, so you pay for only the resources you use and the service can automatically scale tables up and down in response to application traffic. 

This notebook provides a quickstart for using Amazon Keyspaces with Langchain and Amazon Bedrock. In this notebook you wll:
 - Connect to Keyspaces from Langchain
 - Leverage Chat model in Amazon bedrock
 - Store chat history in Amazon Keyspaces


Chat models are a type of language model that operates on a conversational level, taking in chat messages as inputs and generating chat messages as outputs, rather than dealing with plain text. LangChain provides a standardized interface for developers to interact with various chat model providers, such as AWS, OpenAI, Cohere, and Hugging Face, allowing for seamless integration and utilization of these models across different platforms.

One popular application of chat models is task decomposition, a technique that aims to break down complex tasks into smaller, more manageable subtasks. Instead of presenting the model with a large, monolithic input, task decomposition helps agents or models handle difficult tasks by dividing them into more digestible steps. This can be achieved through methods like Chain of Thought (CoT) or Tree of Thoughts, which guide the model through a step-by-step thought process or explore multiple reasoning paths at each stage.

Task decomposition can be implemented in various ways, such as using simple prompting with a language model (LLM), providing task-specific instructions, or leveraging human inputs. For instance, an LLM can be prompted with phrases like "Steps for XYZ" to encourage it to break down a task into steps, or it can be given specific instructions like "Write a story outline" to initiate the task decomposition process. Additionally, human inputs can be incorporated to decompose tasks into smaller, more manageable steps, allowing for a collaborative approach.

To enhance the effectiveness of task decomposition, developers often combine chat models with NoSQL storage solutions. NoSQL databases, with their flexible schema and scalability, provide an ideal storage solution for managing the varying structures and potentially large volumes of subtasks generated during the decomposition process. By integrating chat models and NoSQL storage, developers can create powerful task management and chat applications that leverage the capabilities of generative AI while ensuring efficient storage and retrieval of task-related data.




---
## Prequisits
Checkout the following notebook to get familiar with Bedrock and Sagemaker studio Jupyter notebooks. 
[Amazon Bedrock prerequisites](https://github.com/aws-samples/amazon-bedrock-workshop/blob/main/00_Prerequisites/bedrock_basics.ipynb) 

### IAM
If you are running this notebook from Amazon Sagemaker Studio and your Sagemaker Studio execution role has permissions to access Bedrock and Amazon Keyspaces to run the cells below as-is. This is also the case if you are running these notebooks from a computer whose default AWS credentials have access to Bedrock. 

### Python libraries
Install python dependencies including:
* boto3, langchain, cassandra driver, and Keyspaces Sigv4 Authentication




In [None]:
%pip install --upgrade --quiet --no-build-isolation --force-reinstall \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57" \
    "cassio>=0.1.0" \
    "cassandra-sigv4>=4.0.2"

%pip install --upgrade --quiet langchain langchain-community langchainhub langchain-anthropic bs4  langchain-aws
        

--- 
### Connect and authenticate using IAM

The following cells will test authentication with bedrock and Amazon Keyspaces 

In [None]:
import json
import os
import sys

import boto3

print("test auth") 

boto3_bedrock = boto3.client('bedrock')

boto3_bedrock.list_foundation_models()


In [None]:
from cassandra.cluster import Cluster, ExecutionProfile, EXEC_PROFILE_DEFAULT
from ssl import SSLContext, PROTOCOL_TLSv1_2 , CERT_REQUIRED
from cassandra.auth import PlainTextAuthProvider
import boto3
from cassandra_sigv4.auth import SigV4AuthProvider
from cassandra import ConsistencyLevel

ssl_context = SSLContext(PROTOCOL_TLSv1_2)

boto_session = boto3.Session()

auth_provider = SigV4AuthProvider(boto_session)

profile = ExecutionProfile(
    consistency_level=ConsistencyLevel.LOCAL_QUORUM
)

cluster = Cluster(['cassandra.us-east-1.amazonaws.com'], ssl_context=ssl_context, auth_provider=auth_provider,
                  port=9142, execution_profiles={EXEC_PROFILE_DEFAULT: profile})

session = cluster.connect()

r = session.execute('select * from system_schema.keyspaces')

print(r.current_rows[0])



---
### Create a new Keyspace

A Keyspace is a logical collection of tables. For this notebook we will create a new Keyspace which will hold a few different tables in this example.  In Amazon Keyspaces, resources are created asychronously. When we create a new Keyspace we wait for a few seconds, and then check if the Keyspace is avialable. If the Keyspace is not created it will not return any rows. At this point we are purely using the Cassandra python driver. 

In [45]:

import time

session.execute("CREATE KEYSPACE if not exists keyspaces_genai WITH REPLICATION = {'class':'SingleRegionStrategy'}")


query = "SELECT * FROM system_schema_mcs.keyspaces WHERE keyspace_name = 'keyspaces_genai'"

for x in range(5):
    result = session.execute(query)
    if result:
        print(result.current_rows)
        break
    time.sleep(1)


[Row(keyspace_name='keyspaces_genaiy', durable_writes=True, replication=OrderedMapSerializedKey([('class', 'org.apache.cassandra.locator.SimpleStrategy'), ('replication_factor', '3')]))]


--- 
## Lanchain chat history

In many Q&A applications we want to allow the user to have a back-and-forth conversation with GenAI, meaning the application maintains “memory” of past questions and answers, and some logic for incorporating those into its current thinking.

To use Keyspaces as store for langchain chat history you will need to pass in the session and specify the Keyspace for the message history. This will create a new table in the Keyspace called "message store".  The message store is a simple model consisting of a partition_id and a row_id as a primary key and body_blob text to store raw message. The partition_id allows us to seperate history for each user session so we can scale out the number of user sessions horizontally in Amazon Keyspaces.  


In [None]:
from langchain_community.chat_message_histories import (
    CassandraChatMessageHistory,
)

message_history = CassandraChatMessageHistory(
    session_id="test-user-session",
    session=session,
    keyspace='aws',
)

## Similar to the "keyspaces_genai" keyspace we created earlier, table resources are also created Asynchronously. 
## After creating the CassandraChatMessageHistory, 
## you can make sure the new "message_store" table is active before messages can be stored. 
## In the following example you will inspect the status field for 'ACTIVE'.
query = "select status, default_time_to_live, custom_properties from system_schema_mcs.tables where keyspace_name = 'keyspaces_genai' and table_name = 'message_store'"

for x in range(5):
    result = session.execute(query)
    if any(row[0] == 'ACTIVE' for row in result):
        print(result.current_rows)
        break
    time.sleep(1)



---
### Test CassandraChatMessageHistory functionality

Now that the message_store tabel is complete you can use the CRUD operations to add and retrieve message history. Later we will integrate it directly with Chat Model without having to manually add and retrieve messagges. 

In [None]:
 
message_history.add_user_message("What age should I stop playing basketball?")

message_history.add_ai_message("The answer to the ultimate question of life, the universe, and everything is 42")

print(await message_history.aget_messages())

---
### Connect to ChatBedrcok

ChatBedrock is a [Amazon Bedrock](https://aws.amazon.com/bedrock/) is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can easily experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources. Since Amazon Bedrock is serverless, you don’t have to manage any infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with.

In the following notebook cell you will Creat a Chatmodel that uses Amazon Bedrock and Anthropic claude-3.  You will tranlate a set of text from English to French and stream the result. Streaming allows you to return response in chunks without waiting for the full generated response. 


In [46]:
import bs4
from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_chroma import Chroma
from langchain_community.chat_message_histories import ChatMessageHistory
from langchain_community.document_loaders import WebBaseLoader
from langchain_core.chat_history import BaseChatMessageHistory
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_aws import ChatBedrock
from langchain_core.messages import HumanMessage

#llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)

llm = ChatBedrock(
    model_id="anthropic.claude-3-sonnet-20240229-v1:0",
    model_kwargs={"temperature": 0.1},
)

messages = [
    HumanMessage(
        content="Translate this sentence from English to French. \"Hellow World!\""
    )
]
llm.invoke(messages)

for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)

Voici la traduction en français :

"Bonjour le monde !"

### Test the LLM "memory"

Now we will test the llm's ability to recall the previous request. 


In [49]:
messages = [
    HumanMessage(
        content="Do you recall what I asked you to translate?"
    )
]
llm.invoke(messages)
for chunk in llm.stream(messages):
    print(chunk.content, end="", flush=True)

I'm afraid I don't have any specific context about a previous translation request from you. I don't have a long-term memory of our prior conversations. Could you please restate what you need translated?

---
### Integrating Chathistory with Chatmodel

Previosuly you were manually updating and retrieving chat history. In practice, you could insert it into the each input.  In a real Q&A application we’ll want some way of persisting chat history and some way of automatically inserting and updating it. You may want to be able to do this so you could switch out the unerlying LLM model for another, or switch out the chat history store for another. 


In [None]:
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables.history import RunnableWithMessageHistory

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", "You are a helpful assistant."),
        MessagesPlaceholder(variable_name="history"),
        ("human", "{question}"),
    ]
)

chain = prompt | llm

chain_with_history = RunnableWithMessageHistory(
    chain,
    lambda session_id: CassandraChatMessageHistory(
    session_id=session_id,
    session=session,
    keyspace='keyspaces_genai',
),
    input_messages_key="question",
    history_messages_key="history",
)

config = {"configurable": {"session_id": "mike-session"}}

chain_with_history.invoke({"question": "Hi! I'm Mike"}, config=config)


In [None]:
chain_with_history.invoke({"question": "Hey, do you recall my name?"}, config=config)

In [None]:
chain_with_history.invoke({"question": "How do your remember my name? Do you store it in Amazon Keyspaces?"}, config=config)

## Summary 

As you can see the LLM does not know you are storing the history in Amazon Keyspaces, implmented CassandraChatMessageHistory, or even using Langchain. The CassandraChatMessageHistory is passed to input pf the underlying LLM.  Using Amazon Keyspaces will allow you to store chat history with single digit ms latencies, and scale as the number of users grow. In other examples we will show how Keyspaces can be used as Document Loader and LLM Cache. 