#  Amazon Bedrock RAG Template Demo 

This Jupyter notebooks gives a short demonstration of the Bedrock RAG use cas template where Amazon Bedrock invocations augmented with embeddings retrieved from Aurora vector data base. 

## Agenda:

- Installing requirements
- Embedding definition
- Database connection 
- Data ingestion
- Retrieval augmented text generation
- Relevant document queries


## Installing requirements

In [None]:
!pip install langchain==0.2.1 
!pip install langchain-community==0.2.1
!pip install pgvector==0.2.5 
!pip install psycopg2-binary==2.9.9 
!pip install pydantic-settings==2.1.0 
!pip install instructor==0.3.5 
!pip install tiktoken==0.7.0
!pip install boto3==1.34.101 
!pip install langchain_aws==0.1.6 

## Initialization

### Imports and the creation of the boto3 session

In [None]:

import boto3
from boto3 import Session
import json
import logging
import time
import psycopg2
from langchain_community.vectorstores.pgvector import DistanceStrategy, PGVector
from langchain_community.embeddings.bedrock import BedrockEmbeddings
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_aws import ChatBedrock
from langchain_core.prompts import ChatPromptTemplate


# Configure the logger
logger = logging.getLogger(__name__)

# Use the session to create a client
session = boto3.Session()
credentials = session.get_credentials()


### Retrieving environment variables from the SSM parameter store   

The Terraform deployment saves all essential environment variables to the AWS SSM parameter store. To retrieve those, we use the following helper function.


In [None]:
def get_ssm_parameter(session: Session, parameter_name: str, prefix:str = '/bedrock-rag-template/'):
    """Retrieve a parameter's value from AWS SSM Parameter Store.

    Args:
        session (Session): the boto3 session to use to retrieve the parameters
        parameter_name (str): the name of the parameter
        prefix (str, optional): Parameter's prefix. Defaults to '/bedrock-rag-template/'.

    Returns:
        _type_: _description_
    """
    ssm = session.client('ssm')
    response = ssm.get_parameter(
        Name = prefix+parameter_name
    )
    return response['Parameter']['Value']


# Setup env variables
VECTOR_DB_INDEX = get_ssm_parameter(session, 'VECTOR_DB_INDEX')
PG_VECTOR_DB_NAME = get_ssm_parameter(session, 'PG_VECTOR_DB_NAME')
PG_VECTOR_PORT = get_ssm_parameter(session, 'PG_VECTOR_PORT')
PG_VECTOR_SECRET_ARN = get_ssm_parameter(session, 'PG_VECTOR_SECRET_ARN')
PG_VECTOR_DB_HOST = get_ssm_parameter(session, 'PG_VECTOR_DB_HOST')
S3_BUCKET_NAME = get_ssm_parameter(session, 'S3_BUCKET_NAME')
EMBEDDING_LENGTH = 1024  # specific for titanv2





## Create the Amazon Bedrock Embedding


**Prerequisite:** Ensure you have requested the access to the Amazon Bedrock models successfully, for details see [Model access](https://docs.aws.amazon.com/bedrock/latest/userguide/model-access.html).


To create the LangChain vector store, we need to provide a LangChain embedding. The id of the embedding model id must be the same used to create the embeddings in the first place, in this case:

In [None]:
embedding_model_id = "amazon.titan-embed-text-v2:0" 

br = session.client("bedrock-runtime")
bedrock_embedding = BedrockEmbeddings(client=br, model_id=embedding_model_id)


try:
    br.invoke_model(**{
         "modelId": "amazon.titan-embed-text-v2:0",
         "contentType": "application/json",
         "accept": "*/*",
         "body": "{\"inputText\":\"this is where you place your input text\", \"dimensions\": 512, \"normalize\": true}"
        })
except Exception as e:
    logger.error(f"Please enable model access")

## Add embeddings to the vector store for RAG 

To make use of ingestion pipeline triggered by Amazon S3 bucket notifications, we take the following file and put it to the Amazon S3 bucket to trigger the ingestion. To validate the ingestion, we look up the latest invocation of the AWS Lambda function to verify execution. 


In [None]:
file_content = """"
### Company Overview: TechWorldNova Solutions
**TechWorldNova Solutions** is an innovative technology firm specializing in artificial intelligence and cloud computing solutions. 
Since its founding in 2015, TechNova has been at the forefront of technological advancements, providing cutting-edge products and services to a diverse range of industries.

### Growth and Revenue Highlights

- **2018:**
  - **Revenue:** $15 million
  - **Growth:** 25%
- **2019:**
  - **Revenue:** $20 million
  - **Growth:** 33%
- **2020:**
  - **Revenue:** $30 million
  - **Growth:** 50%
- **2021:**
  - **Revenue:** $45 million
  - **Growth:** 50%
- **2022:**
  - **Revenue:** $60 million
  - **Growth:** 33%
- **2023:**
  - **Revenue:** $80 million
  - **Growth:** 33%

### Key Milestones
- **2017:** Launched first AI-powered analytics platform.
- **2019:** Expanded operations to Europe and Asia.
- **2021:** Introduced cloud computing solutions, gaining significant market traction.
- **2023:** Reached 500+ enterprise clients and crossed $80 million in revenue.
### Future Outlook

TechNova Solutions aims to continue its upward trajectory by investing in research and development, 
exploring new markets, and enhancing its product offerings. 
The company's vision is to be a global leader in AI and cloud computing, driving innovation and delivering exceptional value to its clients."""

In [None]:
s3 = session.client("s3")
s3.put_object(
    Bucket=S3_BUCKET_NAME,
    Key="rag-template-file.txt",
    Body=file_content.encode('utf-8')
)

# wait for the document to be processed.
time.sleep(15)

### Establish a connection the Amazon Aurora and create the LangChain vector store
To get the secret for the data base, we use the following helper function. 

In [None]:
def get_db_secret_value(secret_arn: str) -> str:
    """Get the secret value from the secret manager

    Args:
        secret_arn (str): ARN of the secret

    Returns:
        str: Value of the secret
    """
    client = boto3.client('secretsmanager')
    get_secret_value_response = client.get_secret_value(SecretId=secret_arn)
    return json.loads(get_secret_value_response['SecretString'])


logger.info(f"Retrieve secret from {PG_VECTOR_SECRET_ARN}")
client = session.client(service_name='secretsmanager')
credentials = get_db_secret_value(PG_VECTOR_SECRET_ARN)


connection_string = PGVector.connection_string_from_db_params(
    driver="psycopg2",
    host=PG_VECTOR_DB_HOST,
    port=PG_VECTOR_PORT,
    database=PG_VECTOR_DB_NAME,
    user=credentials['username'],
    password=credentials['password']
)


vector_store = PGVector(
    connection_string=connection_string,
    collection_name=VECTOR_DB_INDEX,
    embedding_function=bedrock_embedding,
    embedding_length=EMBEDDING_LENGTH,
    distance_strategy=DistanceStrategy.COSINE,
)


### Verfify that embedding is present in vector store

We check whether there is a document similar to the string "TechWorldNova Solutions" to verify presence of the embedding in the vector store.

In [None]:
# Wait until documents are in store
i = 0
while i < 10:
    i += 1
    ingested_docs = vector_store.similarity_search("TechWorldNova Solutions")
    if len(ingested_docs) > 0:
        print("Relevant documents found")
        break
    else:
        time.sleep(5)

vector_store.similarity_search("TechWorldNova Solutions")

## Retrieval augmented text generation using Bedrock Claude and the PGVector vector store


Subsequently, we generate a system prompt to test the retrieval augmentation by storing information about an fictitious company called `TechWorldNova Solutions`. Thereby. We ensure that the foundation model has not been trained on the answer yet. We test the retrieval augmentation with Anthropic Claude 2 and 3. 



### Prepare the retriever and the system prompt

In [None]:

retriever=vector_store.as_retriever(search_type="similarity_score_threshold",
                                    search_kwargs={'score_threshold': 0.8})

system_prompt = (
    "Use the given context to answer the question. "
    "If you don't know the answer, say you don't know. "
    "Use three sentence maximum and keep the answer concise. "
    "Context: {context}"
)
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("human", "{input}"),
    ]
)

query = "What is the mission of TechWorldNova Solutions?"

### Claude 3

In [None]:
model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
model_kwargs =  { 
    "max_tokens": 2048,  
}


llm = ChatBedrock(
    model_id=model_id,
    model_kwargs=model_kwargs,
)


question_answer_chain = create_stuff_documents_chain(llm, prompt)
chain = create_retrieval_chain(retriever, question_answer_chain)
response = chain.invoke({"input": query})["answer"]
print(f"CHATBOT ANSWER CLAUDE 3: {response}")


## Retrieve relevant documents for the query (optional)
Run the following cell if you want to get more details about the scores of the selected chunks, relevant for answering the query.

In [None]:
doc_scores = vector_store.similarity_search_with_relevance_scores(query, k=20)

docs = []
for doc, score in doc_scores:
    doc.metadata["document_score"] = score
    docs.append(doc)

for item in docs:
    print(item)


## Retrieve the raw data from the vector store (optional)
If you want to have explore the raw vector store, you can use the query below which fetches all records (only applicable if a few documents are present in the data base)

In [None]:
conn = psycopg2.connect(host=PG_VECTOR_DB_HOST,
            database=PG_VECTOR_DB_NAME,
            user=credentials['username'],
            password=credentials['password'])
cur = conn.cursor()
cur.execute("SELECT * FROM langchain_pg_embedding")
ids = cur.fetchall()

# Print metadata:
# i[0] - document IDs
# i[1] - embeddings
# i[2] - plain text documents
# i[3] - document metadata

print([i[2] for i in ids])