# Using AstraPy with vector database

Work with AstraPy

Make sure you have a Database and get ready to upload the Secure Connect Bundle and supply the Token string
(see [Pre-requisites](https://cassio.org/start_here/#vector-database) on cassio.org for details. Remember you need a **custom Token** with role [Database Administrator](https://awesome-astra.github.io/docs/pages/astra/create-token/)).

### Set up your python environment




In [None]:
!curl -H 'Cache-Control: no-cache, no-store'  https://raw.githubusercontent.com/synedra/astra_vector_examples/main/notebook/astrapy.zip --output astrapy.zip
%pip install astrapy.zip


### Import needed libraries



In [2]:
from astrapy.base import AstraClient, http_methods
from astrapy.serverless import AstraJsonClient
from astrapy.vector import AstraVectorClient
from astrapy.ops import AstraOps
import os




### Astra Setup



### Astra Variables

In [None]:
# Input your database keyspace name:
ASTRA_DB_ID = input('Your Astra DB id: ')

In [None]:
# Input your database keyspace name:
ASTRA_DB_KEYSPACE = input('Your Astra DB Keyspace name (e.g. vector): ')

In [None]:
# Input your Astra DB token string, the one starting with "AstraCS:..."
from getpass import getpass
ASTRA_DB_APPLICATION_TOKEN = getpass('Your Astra DB Token ("AstraCS:..."): ')

In [None]:
# Input your Astra DB token string, the one starting with "AstraCS:..."
from getpass import getpass
ASTRA_DB_REGION = getpass('Your Astra DB Region: ')

In [2]:
# Input your Astra DB token string, the one starting with "AstraCS:..."
from getpass import getpass
TEST_COLLECTION_NAME = getpass('Collection name for tutorial: (e.g. "vector_test") ')

In [None]:
### Client setup
astra_client = AstraClient(
        astra_database_id=ASTRA_DB_ID,
        astra_database_region=ASTRA_DB_REGION,
        astra_application_token=ASTRA_DB_APPLICATION_TOKEN,
    )

vector_client = AstraVectorClient(astra_client=astra_client)
test_namespace = vector_client.namespace(ASTRA_DB_KEYSPACE)
test_collection = vector_client.namespace(ASTRA_DB_KEYSPACE).collection(TEST_COLLECTION_NAME)

# Create and view collections

In [None]:
res = test_namespace.create_vector_collection(name=TEST_COLLECTION_NAME, size=5)
res = test_namespace.get_collections()
print(res)

# Insert one document

In [None]:
json_query = {
        "_id": "4",
        "name": "Coded Cleats Copy",
        "description": "ChatGPT integrated sneakers that talk to you",
        "$vector": [0.25, 0.25, 0.25, 0.25, 0.25],
    }

res = test_collection.create(document=json_query)
print(res)

# Insert multiple documents

In [None]:
json_query = [
        {
            "_id": "1",
            "name": "Coded Cleats",
            "description": "ChatGPT integrated sneakers that talk to you",
            "$vector": [0.1, 0.15, 0.3, 0.12, 0.05],
        },
        {
            "_id": "2",
            "name": "Logic Layers",
            "description": "An AI quilt to help you sleep forever",
            "$vector": [0.45, 0.09, 0.01, 0.2, 0.11],
        },
        {
            "_id": "3",
            "name": "Vision Vector Frame",
            "description": "Vision Vector Frame - A deep learning display that controls your mood",
            "$vector": [0.1, 0.05, 0.08, 0.3, 0.6],
        },
    ]

res = test_collection.insert_many(documents=json_query)
print(res)

In [None]:
# Input your AWS Session Token
os.environ["AWS_SESSION_TOKEN"] = getpass('Your AWS Session Token: ')

### Set up AWS Bedrock objects



In [3]:
bedrock_runtime=boto3.client('bedrock-runtime','us-west-2',endpoint_url='https://bedrock.us-west-2.amazonaws.com')
bedrock_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=bedrock_runtime)


### Set up a vectorstore in Astra

In [4]:
myCassandraVStore = Cassandra(
embedding=bedrock_embeddings,
session=session,
keyspace=ASTRA_DB_KEYSPACE,
table_name='shakespeare_act5'
)


### Populate the database

Add lines for the text of Romeo and Astra, Scene 5, Act 3



In [None]:
# retrieve the text of a scene from act 5 of Romeo and Astra. 
# Juliet's name was changed to Astra to prevent the LLM from "cheating" when providing an answer.
! mkdir texts
! curl https://raw.githubusercontent.com/synedra/astra_vector_examples/main/romeo_astra.json --output texts/romeo_astra.json
quote_array = json.load(open('texts/romeo_astra.json'))


Next, you'll populate the database with the lines from the play.
This can take a couple of minutes, please be patient.  In total there are 321 lines.


In [5]:
for index in range(len(quote_array)):
 location = ''
 if (quote_array[index]['ActSceneLine'] != ''):
    (act, scene, line) = quote_array[index]['ActSceneLine'].split('.')
    location = 'Act {}, Scene {}, Line {}'.format(act, scene, line)
 quote_input = '{} : {} : {}'.format(location, quote_array[index]['Player'], quote_array[index]['PlayerLine'])
 input_document = Document(page_content=quote_input)
 print(quote_input)
 myCassandraVStore.add_documents(documents=[input_document])


### Ask a question
Enter a question and create the prompt template for the LLM query.  
Give instructions about what you want the model to do.


In [None]:
# Enter a question about Romeo and Astra (Like 'How did Astra die?')
QUESTION_FOR_MODEL = input('Your question about Romeo and Astra: ')

In [6]:
generation_prompt_template = """Please answer a question from a user.
Create a summary of the information between ## to answer the question. 
Your task is to answer the question using only the summary using 20 words. 

#
{context}
#

question= {question}
Answer:"""


### Create a retriever
This will allow you to get the most relevant documents from the database


In [6]:
retriever = myCassandraVStore.as_retriever(search_kwargs={'k':2,})
output=retriever.get_relevant_documents(QUESTION_FOR_MODEL)
prompt = PromptTemplate.from_template("{page_content}")
context = ''
for document in output:
    context += ' *** ' + document.page_content 

print(context)


### Build the llm prompt


In [6]:
llm_prompt = generation_prompt_template.format(
    question=QUESTION_FOR_MODEL,
    context=context,
)


### Generate the Answer 
Generate the answer using the prompt, using amazon.titan-tg1-large as the LLM model


In [6]:
body = json.dumps({'inputText': llm_prompt})
modelId = 'amazon.titan-tg1-large' 
accept = 'application/json'
contentType = 'application/json'
response = bedrock_runtime.invoke_model(body=body, modelId=modelId, accept=accept, contentType=contentType)
response_body = json.loads(response.get('body').read())
print ('Question: ' + QUESTION_FOR_MODEL)
print ('Answer: ' + response_body.get('results')[0].get('outputText'))


### Additional resources
To learn more about Amazon Bedrock you can check out this site:  https://github.com/aws-samples/amazon-bedrock-samples/tree/main/introduction-to-bedrock