### RAG Approach with Amazon Bedrock and LangChain


By leveraging the RAG approach, users can tap into a vast repository of contextual information, allowing our generative AI models to produce more informed and accurate outputs. However, one must remain mindful of the token limitations and carefully curate the contextual information we incorporate, ensuring that we strike a balance between breadth and depth of knowledge, while staying within the model's constraints. 


In this approach, we will look into the approach on small document ingestion. This code sample showcases how one can perform semantic similarity with the query from the source data and leverage the pertinent contextual infromation information, augmented to the prompt in order to invoke the LLM. 

In [None]:
# Install Chroma DB package
%pip install chromadb

In [None]:
#importing the respective libraries 
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma

import boto3
import botocore

In [None]:
#Create client side Amazon Bedrock connection with Boto3 library
region = os.environ.get("AWS_REGION")
bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name=region,
)

In [None]:
# Load the document. Provide path to the document below.
loader = TextLoader('path/to/document.txt’)
documents = loader.load()

In [None]:
# Split the documents into chunks
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

In [None]:
# Create embeddings and store in Chroma vector store
from langchain_community.embeddings import BedrockEmbeddings
embeddings = BedrockEmbeddings(client=boto3_bedrock, model_id="amazon.titan-embed-text-v1")

In [None]:
db = Chroma.from_documents(texts, embeddings)

In [None]:
# Enter a user query 
query = "Enter your query here”

In [None]:
#Perform Similarity search by finding relevant information from the embedded data 

retriever = db.similarity_search(query, k=3)
full_context = '\n'.join([f'Document {indexing+1}: ' + i.page_content for indexing, i in enumerate(retriever)])

print(full_context)

In [None]:
#Since we have the relevant documents identified within “full_context”, we can use the LLM to generate an optimal answer based on the retreived documents. Prior to that, let us format our prompt template before feeding to the LLM.

prompt_template = f"""Answer the user’s question solely only on the information provided between <></> XML tags. Think step by step and provide detailed instructions.
<context>
{full_context}
</context>

Question: {query}
Answer:"""

In [None]:
PROMPT = PromptTemplate.from_template(prompt_template)

#Prompt data input creation to feed to the LLM
prompt_data_input = PROMPT.format(human_input=query, context=context_string)

In [None]:
#Now, you can Invoke the foundation model using boto3 to generate the output response.
body = json.dumps({"inputText": prompt_data_input, "textGenerationConfig": model_parameters})
accept = "application/json"
contentType = "application/json"

In [None]:
# You can change this modelID to use an alternate version from the model provider
modelId = "amazon.titan-tg1-large"

In [None]:
response = bedrock_runtime.invoke_model(
    body=body, modelId=modelId, accept=accept, contentType=contentType)

In [None]:
generated_response_body = json.loads(response.get("body").read())
print(generated_response_body.get("results")[0].get("outputText").strip())