# Vector Similarity Astra-Bedrock Search QA Quickstart

Set up a simple Question-Answering system with LangChain and AWS Bedrock, using Cassandra as the Vector Database.

Make sure you have a Database and get ready to upload the Secure Connect Bundle and supply the Token string
(see [Pre-requisites](https://cassio.org/start_here/#vector-database) on cassio.org for details. Remember you need a **custom Token** with role [Database Administrator](https://awesome-astra.github.io/docs/pages/astra/create-token/)).

Likewise, ensure you have the necessary secret for the LLM provider of your choice: you'll be asked to input it shortly
(see [Pre-requisites](https://cassio.org/start_here/#llm-access) on cassio.org for details).



### Set up your python environment




In [1]:
!pwd
!curl -H 'Cache-Control: no-cache, no-store'  https://raw.githubusercontent.com/synedra/astra_vector_examples/main/notebook/requirements_bedrock.txt --output requirements.txt
!curl https://d2eo22ngex1n9g.cloudfront.net/Documentation/SDK/bedrock-python-sdk.zip --output bedrock-python-sdk.zip
!unzip -o bedrock-python-sdk.zip
!pip install -r requirements.txt


### Import needed libraries



In [2]:
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from langchain.embeddings import BedrockEmbeddings
from langchain.vectorstores.cassandra import Cassandra
from langchain.docstore.document import Document
from langchain.prompts import PromptTemplate
import boto3, json, os, sys
from getpass import getpass
from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader


### Astra Setup



### Astra DB Secure Connect Bundle

Please upload the Secure Connect Bundle zipfile to connect to your Astra DB instance.

The Secure Connect Bundle is needed to establish a secure connection to the database.
Click [here](https://awesome-astra.github.io/docs/pages/astra/download-scb/#c-procedure) for instructions on how to download it from Astra DB.

In [None]:
# Upload your Secure Connect Bundle zipfile:
import os
from google.colab import files


print('Please upload your Secure Connect Bundle')
uploaded = files.upload()
if uploaded:
    astraBundleFileTitle = list(uploaded.keys())[0]
    ASTRA_DB_SECURE_BUNDLE_PATH = os.path.join(os.getcwd(), astraBundleFileTitle)
else:
    raise ValueError(
        'Cannot proceed without Secure Connect Bundle. Please re-run the cell.'
    )

In [None]:
# Input your database keyspace name:
ASTRA_DB_KEYSPACE = input('Your Astra DB Keyspace name (e.g. vector): ')

In [None]:
# Input your Astra DB token string, the one starting with "AstraCS:..."
from getpass import getpass
ASTRA_DB_APPLICATION_TOKEN = getpass('Your Astra DB Token ("AstraCS:..."): ')

In [None]:
# colab-specific override of helper functions
from cassandra.cluster import (
    Cluster,
)
from cassandra.auth import PlainTextAuthProvider


def getCQLSession(mode='astra_db'):
    if mode == 'astra_db':
        cluster = Cluster(
            cloud={
                "secure_connect_bundle": ASTRA_DB_SECURE_BUNDLE_PATH,
            },
            auth_provider=PlainTextAuthProvider(
                "token",
                ASTRA_DB_APPLICATION_TOKEN,
            ),
        )
        astraSession = cluster.connect()
        return astraSession
    else:
        raise ValueError('Unsupported CQL Session mode')

def getCQLKeyspace(mode='astra_db'):
    if mode == 'astra_db':
        return ASTRA_DB_KEYSPACE
    else:
        raise ValueError('Unsupported CQL Session mode')

In [2]:
# creation of the DB connection
cqlMode = 'astra_db'
session = getCQLSession(mode=cqlMode)
keyspace = getCQLKeyspace(mode=cqlMode)

### AWS Credential Setup



In [None]:
# Input your AWS Access Key ID
os.environ["AWS_ACCESS_KEY_ID"] = getpass('Your AWS Access Key ID: ')

In [None]:
# Input your AWS Secret Access Key
os.environ["AWS_SECRET_ACCESS_KEY"] = getpass('Your AWS Secret Access Key: ')

In [None]:
# Input your AWS Session Token
os.environ["AWS_SESSION_TOKEN"] = getpass('Your AWS Session Token: ')

### Set up AWS Bedrock objects



In [3]:
bedrock=boto3.client('bedrock','us-west-2',endpoint_url='https://invoke-bedrock.us-west-2.amazonaws.com')
br_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-g1-text-02", client=bedrock)


### Set up a vectorstore in Astra

In [4]:
myCassandraVStore = Cassandra(
embedding=br_embeddings,
session=session,
keyspace=ASTRA_DB_KEYSPACE,
table_name='shakespeare_act5'
)


### Populate the database

Add lines for the text of Romeo and Astra, Scene 5, Act 3



In [None]:
# retrieve the text of a scene from act 5 of Romeo and Astra. 
# Juliet's name was changed to Astra to prevent the LLM from "cheating" when providing an answer.
! mkdir texts
! curl https://raw.githubusercontent.com/synedra/astra_vector_examples/main/romeo_astra.json --output texts/romeo_astra.json
quote_array = json.load(open('texts/romeo_astra.json'))


Next, you'll populate the database with the lines from the play.
This can take a couple of minutes, please be patient.  In total there are 321 lines.


In [5]:
for index in range(len(quote_array)):
 location = ''
 if (quote_array[index]['ActSceneLine'] != ''):
    (act, scene, line) = quote_array[index]['ActSceneLine'].split('.')
    location = 'Act {}, Scene {}, Line {}'.format(act, scene, line)
 quote_input = '{} : {} : {}'.format(location, quote_array[index]['Player'], quote_array[index]['PlayerLine'])
 input_document = Document(page_content=quote_input)
 print(quote_input)
 myCassandraVStore.add_documents(documents=[input_document])


### Ask a question
Enter a question and create the prompt template for the LLM query.  
Give instructions about what you want the model to do.


In [None]:
# Enter a question about Romeo and Astra (Like 'How did Astra die?')
QUESTION_FOR_MODEL = input('Your question about Romeo and Astra: ')

In [6]:
generation_prompt_template = """Please answer a question from a user.
Create a summary of the information between ## to answer the question. 
Your task is to answer the question using only the summary using 20 words. 

#
{context}
#

question= {question}
Answer:"""


### Create a retriever
This will allow you to get the most relevant documents from the database


In [6]:
retriever = myCassandraVStore.as_retriever(search_kwargs={'k':2,})
output=retriever.get_relevant_documents(QUESTION_FOR_MODEL)
prompt = PromptTemplate.from_template("{page_content}")
context = ''
for document in output:
    context += ' *** ' + document.page_content 

print(context)


### Build the llm prompt


In [6]:
llm_prompt = generation_prompt_template.format(
    question=QUESTION_FOR_MODEL,
    context=context,
)


### Generate the Answer 
Generate the answer using the prompt, using amazon.titan-tg1-large as the LLM model


In [6]:
body = json.dumps({'inputText': llm_prompt})
modelId = 'amazon.titan-tg1-large' 
accept = 'application/json'
contentType = 'application/json'


### Get the Answer


In [6]:
body = json.dumps({'inputText': llm_prompt})
modelId = 'amazon.titan-tg1-large' 
accept = 'application/json'
contentType = 'application/json'
response = bedrock.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())
print ('Question: ' + QUESTION_FOR_MODEL)
print ('Answer: ' + response_body.get('results')[0].get('outputText'))
