<a href="https://colab.research.google.com/github/YoshiyukiKono/gen_ai-sandbox/blob/main/bedrock.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Vector Similarity Astra-Bedrock Search QA Quickstart

Set up a simple Question-Answering system with LangChain and Amazon Bedrock, using Astra DB as the Vector Database.

## Prerequisites

Make sure you have a vector-capable Astra database (get one for free at [astra.datastax.com](https://astra.datastax.com)):

- You will be asked to provide the **Database ID** for your Astra DB instance (see [here](https://awesome-astra.github.io/docs/pages/astra/faq/#where-should-i-find-a-database-identifier) for details);
- Ensure you have an **Access Token** for your database with role _Database Administrator_ (see [here](https://awesome-astra.github.io/docs/pages/astra/create-token/) for details).

Likewise, you will need the credentials to your Amazon Web Services identity, with access to **Amazon Bedrock**.

## Set up your Python environment

In [1]:
%pip install --quiet \
  "cassio>=0.1.3" \
  "langchain==0.0.249" \
  "boto3==1.28.62" \
  "botocore==1.31.62" \
  "cohere==4.37" \
  "openai==1.3.7" \
  "tiktoken==0.5.2" \
  "awscli==1.29.62"

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m40.1/40.1 kB[0m [31m491.5 kB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.4/1.4 MB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m135.8/135.8 kB[0m [31m1.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m11.2/11.2 MB[0m [31m41.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m48.9/48.9 kB[0m [31m3.3 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m221.4/221.4 kB[0m [31m15.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m45.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m4.3/4.3 MB[0m [31m30.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━

## Import needed libraries

In [2]:
import json
import os
import sys
from getpass import getpass


import boto3
import cassio

from langchain.embeddings import BedrockEmbeddings
from langchain.llms import Bedrock
from langchain.vectorstores import Cassandra
from langchain.schema import Document
from langchain.prompts import PromptTemplate
from langchain.document_loaders import TextLoader

## Astra DB Setup

In [3]:
ASTRA_DB_ID = input("Enter your Astra DB ID ('0123abcd-'):")
ASTRA_DB_APPLICATION_TOKEN = getpass("Enter your Astra DB Token ('AstraCS:...'):")
ASTRA_DB_KEYSPACE = input("Enter your keyspace name (optional, default keyspace used if not provided):")

Enter your Astra DB ID ('0123abcd-'):ebe83fdc-873b-4c91-b923-80b6c5e82f27
Enter your Astra DB Token ('AstraCS:...'):··········
Enter your keyspace name (optional, default keyspace used if not provided):bedrock


In [4]:
cassio.init(
    token=ASTRA_DB_APPLICATION_TOKEN,
    database_id=ASTRA_DB_ID,
    keyspace=ASTRA_DB_KEYSPACE if ASTRA_DB_KEYSPACE else None,
)

ERROR:cassandra.connection:Closing connection <AsyncoreConnection(140568702113904) ebe83fdc-873b-4c91-b923-80b6c5e82f27-us-east-2.db.astra.datastax.com:29042:36887b67-fd8a-4452-9d77-6c6fe16ea1b1> due to protocol error: Error from server: code=000a [Protocol error] message="Beta version of the protocol used (5/v5-beta), but USE_BETA flag is unset"


## AWS Credentials Setup

_Note_: in the following cells you will be asked to explicitly provide the credentials to your AWS account. These are set as environment variables for usage by the subsequent `boto3` calls. Please refer to [boto3's documentation](https://boto3.amazonaws.com/v1/documentation/api/latest/guide/credentials.html) on the possible ways to supply your credentials in a more production-like environment.

In particular, if you are running this notebook in **Amazon SageMaker Studio**, please note that it is sufficient to add the Bedrock policy to your SageMaker role, as outlined at [this link](https://github.com/aws-samples/amazon-bedrock-workshop#enable-aws-iam-permissions-for-bedrock), to access the Bedrock services. In that case you can skip the following three setup cells.

In [5]:
# Input your AWS Access Key ID
os.environ["AWS_ACCESS_KEY_ID"] = getpass("Your AWS Access Key ID:")

Your AWS Access Key ID:··········


In [6]:
# Input your AWS Secret Access Key
os.environ["AWS_SECRET_ACCESS_KEY"] = getpass("Your AWS Secret Access Key:")

Your AWS Secret Access Key:··········


In [7]:
# Input your AWS Session Token
os.environ["AWS_SESSION_TOKEN"] = getpass("Your AWS Session Token:")

Your AWS Session Token:··········


## Set up AWS Bedrock objects

In [8]:
bedrock_runtime = boto3.client("bedrock-runtime", "us-west-2")
bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1",
                                       client=bedrock_runtime)

## Set up the Vector Store

This command will create a suitable table in your database if it does not exist yet:

In [9]:
vector_store = Cassandra(
    embedding=bedrock_embeddings,
    table_name="shakespeare_act5",
    session=None,  # <-- meaning: use the global defaults from cassio.init()
    keyspace=None,  # <-- meaning: use the global defaults from cassio.init()
)

## Populate the database

Add lines for the text of "Romeo and Astra", Scene 5, Act 3

In [10]:
# retrieve the text of a scene from act 5 of Romeo and Astra.
# Juliet's name was changed to Astra to prevent the LLM from "cheating" when providing an answer.
! mkdir -p "texts"
! curl "https://raw.githubusercontent.com/awesome-astra/docs/main/docs/pages/aiml/aws/bedrock_resources/romeo_astra.json" \
    --output "texts/romeo_astra.json"
input_lines = json.load(open("texts/romeo_astra.json"))

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 75985  100 75985    0     0   252k      0 --:--:-- --:--:-- --:--:--  252k


Next, you'll populate the database with the lines from the play.
This can take a couple of minutes, please be patient.  In total there are 321 lines.


In [11]:
input_documents = []

for input_line in input_lines:
    if (input_line["ActSceneLine"] != ""):
        (act, scene, line) = input_line["ActSceneLine"].split(".")
        location = "Act {}, Scene {}, Line {}".format(act, scene, line)
        metadata = {"act": act, "scene": scene, "line": line}
    else:
        location = ""
        metadata = {}
    quote_input = "{} : {} : {}".format(location, input_line["Player"], input_line["PlayerLine"])
    input_document = Document(page_content=quote_input, metadata=metadata)
    input_documents.append(input_document)

print(f"Adding {len(input_documents)} documents ... ", end="")
vector_store.add_documents(documents=input_documents, batch_size=50)
print("Done.")

Adding 321 documents ... Done.


## Answer questions

In [12]:
prompt_template_str = """Human: Use the following pieces of context to provide a concise answer to the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.

<context>
{context}
</context

Question: {question}

Assistant:"""

prompt = PromptTemplate.from_template(prompt_template_str)

We choose to use the following LLM model (see [this page](https://docs.aws.amazon.com/bedrock/latest/userguide/model-parameters.html#model-parameters-general) for more info):

In [13]:
model_id = "anthropic.claude-v2"

Here the question-answering function is set up, implementing the RAG pattern:

In [14]:
req_accept = "application/json"
req_content_type = "application/json"

# This, created from the vector store, will fetch the top relevant documents given a text query
retriever = vector_store.as_retriever(search_kwargs={"k": 5})

def answer_question(question: str, verbose: bool = False) -> str:
    if verbose:
        print(f"\n[answer_question] Question: {question}")
    # Retrieval of the most relevant stored documents from the vector store:
    context_docs = retriever.get_relevant_documents(question)
    context = "\n".join(doc.page_content for doc in context_docs)
    if verbose:
        print("\n[answer_question] Context:")
        print(context)
    # Filling the prompt template with the current values
    llm_prompt_str = prompt.format(
        question=question,
        context=context,
    )
    # Invocation of the Amazon Bedrock LLM for text completion -- ultimately obtaining the answer
    llm_body = json.dumps({"prompt": llm_prompt_str, "max_tokens_to_sample": 500})
    llm_response = bedrock_runtime.invoke_model(
        body=llm_body,
        modelId=model_id,
        accept=req_accept,
        contentType=req_content_type,
    )
    llm_response_body = json.loads(llm_response["body"].read())
    answer = llm_response_body["completion"].strip()
    if verbose:
        print(f"\n[answer_question] Answer: {answer}\n")
    return answer

In [15]:
my_answer = answer_question("Who dies in the story?")
print("=" * 60)
print(my_answer)



Based on the provided context, it seems that Astra and Romeo both die in the story. The watchman refers to Astra being found dead and bleeding. Balthasar brings news of Astra's death to Romeo. Romeo is also found dead in the same vault where Astra lies dead. So the context indicates that both Astra and Romeo die in the story.


Let's take a look at the RAG process piece-wise:

In [16]:
my_answer = answer_question("Who dies in the story?", verbose=True)
print("=" * 60)
print(my_answer)


[answer_question] Question: Who dies in the story?





[answer_question] Context:
Act 5, Scene 3, Line 184 : First Watchman : And Astra bleeding, warm, and newly dead,
Act 5, Scene 3, Line 300 : PRINCE : Came to this vault to die, and lie with Astra.
Act 5, Scene 3, Line 282 : BALTHASAR : I brought my master news of Astra's death,
Act 5, Scene 3, Line 206 : First Watchman : Warm and new kill'd.
Act 5, Scene 3, Line 205 : First Watchman : And Romeo dead, and Astra, dead before,

[answer_question] Answer: Based on the provided context, it seems that Astra and Romeo both die in the story. The watchman finds Astra dead in the vault, and later finds Romeo dead there as well. Balthasar brings news to Romeo that Astra is dead, which leads Romeo to go to the vault to die beside Astra. So the characters who die are Astra and Romeo.

Based on the provided context, it seems that Astra and Romeo both die in the story. The watchman finds Astra dead in the vault, and later finds Romeo dead there as well. Balthasar brings news to Romeo that Astra is dea

### Interactive QA session

In [17]:
user_question = ""
while True:
    user_question = input("Enter a question (empty to quit):").strip()
    if user_question:
        print(f"Answer ==> {answer_question(user_question)}")
    else:
        print("[User, AI exeunt]")
        break

Enter a question (empty to quit):Who kills Romeo?




Answer ==> Based on the provided context, I don't have enough information to determine who kills Romeo. The lines mention that Romeo is dead, but do not specify who killed him.
Enter a question (empty to quit):
[User, AI exeunt]


## Japanese Trial

In [18]:
my_answer = answer_question("物語の中で死んだのは誰?", verbose=True)
print("=" * 60)
print(my_answer)


[answer_question] Question: 物語の中で死んだのは誰?





[answer_question] Context:
Act 5, Scene 3, Line 320 : PRINCE : For never was a story of more woe
Act 5, Scene 3, Line 51 : PARIS : It is supposed, the fair creature died,
Act 5, Scene 3, Line 318 : PRINCE : Go hence, to have more talk of these sad things,
Act 5, Scene 3, Line 207 : PRINCE : Search, seek, and know how this foul murder comes.
Act 5, Scene 3, Line 303 : PRINCE : That heaven finds means to kill your joys with love.

[answer_question] Answer: コンテキストから判断すると、Julietが死んだと考えられます。Act 5, Scene 3, Line 51でParisが"fair creature"が死んだと言っています。これはJulietを指していると思われ、物語の中でJulietが死んだことが示唆されています。正確にはコンテキストからは判断できませんが、Julietの死を示唆していると考えられます。

コンテキストから判断すると、Julietが死んだと考えられます。Act 5, Scene 3, Line 51でParisが"fair creature"が死んだと言っています。これはJulietを指していると思われ、物語の中でJulietが死んだことが示唆されています。正確にはコンテキストからは判断できませんが、Julietの死を示唆していると考えられます。


In [21]:
my_answer = answer_question("城の見回りをしているのは誰?", verbose=True)
print("=" * 60)
print(my_answer)


[answer_question] Question: 城の見回りをしているのは誰?





[answer_question] Context:
Act 5, Scene 3, Line 2 : PARIS : Yet put it out, for I would not be seen.
Act 5, Scene 3, Line 195 : Third Watchman : As he was coming from this churchyard side.
Act 5, Scene 3, Line 176 : First Watchman : [Within]  Lead, boy: which way?
Act 5, Scene 3, Line 6 : PARIS : Being loose, unfirm, with digging up of graves,
Act 5, Scene 3, Line 284 : BALTHASAR : To this same place, to this same monument.

[answer_question] Answer: 私は確信はありませんが、文脈から判断すると、見回りをしているのはwatchmanたちのようです。特に「First Watchman」と「Third Watchman」が登場しています。よって、見回りをしているのはおそらくwatchmanたちだと思いますが、はっきりとは言えません。申し訳ありませんが、確かな答えは分かりません。

私は確信はありませんが、文脈から判断すると、見回りをしているのはwatchmanたちのようです。特に「First Watchman」と「Third Watchman」が登場しています。よって、見回りをしているのはおそらくwatchmanたちだと思いますが、はっきりとは言えません。申し訳ありませんが、確かな答えは分かりません。


## Additional resources

To learn more about Amazon Bedrock, visit this page: [Introduction to Amazon Bedrock](https://github.com/aws-samples/amazon-bedrock-samples/tree/main/introduction-to-bedrock).