# Vector Similarity Astra-Bedrock Search QA Quickstart

Set up a simple Question-Answering system with LangChain and AWS Bedrock, using Cassandra as the Vector Database.

Make sure you have a Database and get ready to upload the Secure Connect Bundle and supply the Token string
(see [Pre-requisites](https://cassio.org/start_here/#vector-database) on cassio.org for details. Remember you need a **custom Token** with role [Database Administrator](https://awesome-astra.github.io/docs/pages/astra/create-token/)).

Likewise, ensure you have the necessary secret for the LLM provider of your choice: you'll be asked to input it shortly
(see [Pre-requisites](https://cassio.org/start_here/#llm-access) on cassio.org for details).



In [1]:
from cassandra.cluster import Cluster
from cassandra.auth import PlainTextAuthProvider
from langchain.embeddings import BedrockEmbeddings
from langchain.vectorstores.cassandra import Cassandra
from langchain.docstore.document import Document
from langchain.prompts import PromptTemplate
import boto3, json, os, sys
from getpass import getpass
from dotenv import load_dotenv
from langchain.docstore.document import Document
from langchain.document_loaders import TextLoader
load_dotenv()


In [None]:
# Input your database keyspace name:
ASTRA_DB_KEYSPACE = input('Your Astra DB Keyspace name (e.g. vector): ')

### Astra DB Secure Connect Bundle

Please upload the Secure Connect Bundle zipfile to connect to your Astra DB instance.

The Secure Connect Bundle is needed to establish a secure connection to the database.
Click [here](https://awesome-astra.github.io/docs/pages/astra/download-scb/#c-procedure) for instructions on how to download it from Astra DB.

In [None]:
# Upload your Secure Connect Bundle zipfile:
import os
from google.colab import files


print('Please upload your Secure Connect Bundle')
uploaded = files.upload()
if uploaded:
    astraBundleFileTitle = list(uploaded.keys())[0]
    ASTRA_DB_SECURE_BUNDLE_PATH = os.path.join(os.getcwd(), astraBundleFileTitle)
else:
    raise ValueError(
        'Cannot proceed without Secure Connect Bundle. Please re-run the cell.'
    )

In [None]:
# Input your Astra DB token string, the one starting with "AstraCS:..."
from getpass import getpass
ASTRA_DB_APPLICATION_TOKEN = getpass('Your Astra DB Token ("AstraCS:..."): ')

In [None]:
# colab-specific override of helper functions
from cassandra.cluster import (
    Cluster,
)
from cassandra.auth import PlainTextAuthProvider


def getCQLSession(mode='astra_db'):
    if mode == 'astra_db':
        cluster = Cluster(
            cloud={
                "secure_connect_bundle": ASTRA_DB_SECURE_BUNDLE_PATH,
            },
            auth_provider=PlainTextAuthProvider(
                "token",
                ASTRA_DB_APPLICATION_TOKEN,
            ),
        )
        astraSession = cluster.connect()
        return astraSession
    else:
        raise ValueError('Unsupported CQL Session mode')

def getCQLKeyspace(mode='astra_db'):
    if mode == 'astra_db':
        return ASTRA_DB_KEYSPACE
    else:
        raise ValueError('Unsupported CQL Session mode')

In [None]:
# Input your AWS Access Key ID
AWS_ACCESS_KEY_ID = getpass('Your AWS Access Key ID: ')

In [None]:
# Input your AWS Secret Access Key
AWS_SECRET_ACCESS_KEY = getpass('Your AWS Secret Access Key: ')

In [None]:
# Input your AWS Session Token
AWS_SESSION_TOKEN = getpass('Your AWS Session Token: ')

In [2]:
# creation of the DB connection
cqlMode = 'astra_db'
session = getCQLSession(mode=cqlMode)
keyspace = getCQLKeyspace(mode=cqlMode)

The following lines set up the bedrock objects to use during the quickstart.


In [3]:
bedrock=boto3.client('bedrock','us-west-2',endpoint_url='https://invoke-bedrock.us-west-2.amazonaws.com')
br_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-g1-text-02", client=bedrock)


### Set up a vectorstore in Astra

In [4]:
myCassandraVStore = Cassandra(
embedding=br_embeddings,
session=session,
keyspace='vector',
table_name='shakespeare_act5'



In [None]:
# retrieve the text of a scene from act 5 of Romeo and Astra.  The names were changed to prevent the LLM from "cheating" when providing an answer.
! mkdir texts
! curl https://raw.githubusercontent.com/synedra/astra_vector_examples/main/romeo_astra_act5.json --output texts/amontillado.json


quote_array = json.load(open('texts/romeo_astra.json')


Next, you'll populate the database with the lines from the play.  This can take a couple of minutes, please be patient.

In [5]:
for index in range(len(quote_array)):
location = ''
if (quote_array[index]['ActSceneLine'] != ''):
    (act, scene, line) = quote_array[index]['ActSceneLine'].split('.')
    location = 'Act {}, Scene {}, Line {}'.format(act, scene, line)
quote_input = '{}
{} : {}'.format(location, quote_array[index]['Player'], quote_array[index]['PlayerLine']
input_document = Document(page_content=quote_input)
print(quote_input)
myCassandraVStore.add_documents(documents=[input_document])


Create the prompt template for the LLM query.  Give strong instructions about what you want the model to do.

In [6]:
generation_prompt_template = """Please answer a question from a user.
Remove any documents with potentially offensive content.
Create a summary of the remaining information between ## to answer the question. 
Your task is to answer the question using only the summary using at least 20 and no more than 50 words. 
If the document does not contain the information needed to answer this question then simply write: Insufficient information.

#
{context}
#

question= {question}
Answer:"""
