## Module 2, Activity 2: Implementing a RAG Pipeline

Up until this point we have created an in-memory vector database and populated it with some text.  Now we are going to see how we can create a pipeline with this database.

In [1]:
import boto3
import json
import time



from langchain_aws import BedrockEmbeddings, ChatBedrock, ChatBedrockConverse
from langchain.chains import RetrievalQA
from langchain_core.vectorstores import InMemoryVectorStore
from langchain.text_splitter import CharacterTextSplitter, RecursiveCharacterTextSplitter

In [2]:
def get_data_from_s3(bucket_name, key):
    s3 = boto3.client(
        's3',
        region_name="us-west-2",
    )
    response = s3.get_object(Bucket=bucket_name, Key=key)
    data = response['Body'].read().decode('utf-8')

    return data

In [3]:
session = boto3.session.Session()
region = session.region_name
bedrock_runtime = boto3.client("bedrock-runtime", region_name='us-west-2')

## Introduction

As before, we are going to follow the same steps for creating our vectors for RAG:

1. Load in text data
2. Create a splitter with a defined chunking strategy
3. Create chunks (a list of LangChain documents)
4. Create an embedding for each chunk
5. Store the chunk and embedding in the vector store

In [4]:
s3_data = get_data_from_s3("dpgenaitraining", "q2_results.txt")
s3_data[0:200]

'BILL Reports Second Quarter Fiscal Year 2025 Financial Results\nFebruary 6, 2025\n\n\t•\tQ2 Core Revenue Increased 16% Year-Over-Year\n\t•\tQ2 Total Revenue Increased 14% Year-Over-Year\nSAN JOSE, Calif.--(BUS'

In [5]:
splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=100)
chunks = splitter.create_documents([s3_data])
print(f"Total chunks created: {len(chunks)}")

Total chunks created: 43


In [6]:
embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v2:0", client=bedrock_runtime)

In [7]:
vector_store = InMemoryVectorStore.from_documents(chunks, embeddings)

## Bringing the LLM back into it

In Module 3, Activity 1 we performed a cosine similarity to find the top `k` most similar documents.  However, that is not really a useful way to look at things based on the fact that what is returned reads like a chunk of arbitrary text.  It would be much better if those chunks were turned into a easily-understood text.  And so now we will bring the LLM back in!

We will use an LLM (note that it is a different LLM than used for the embeddings!) to take the returned information from the vector store and turn it into an actual answer to the question.  In order to do so, we have to _retrieve_ information from the vector store first and we then run those retrieved results through the LLM.  This should remind you of the chains we created in previous modules, and it should!  We actually will create a retrieval question-answer chain using `RetrievalQA` below, where we chain our LLM and retriever together.  (Note that you could -- and should! -- add a prompt to this, but we are leaving it out now for the save of breavity.)

In [8]:
llm = ChatBedrockConverse(
    model="anthropic.claude-3-sonnet-20240229-v1:0",
    temperature=0.0,
)

qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vector_store.as_retriever(search_kwargs={"k": 3})
)

In [9]:
qa_chain.invoke('What were the subscription fees?')

{'query': 'What were the subscription fees?',
 'result': 'According to the financial highlights provided, the subscription fees for the second quarter were $67.7 million, up 7% year-over-year.'}

## Concluding thoughts

We now have all of the things we need to create RAG pipelines.  Be sure to experiment with the question you ask.  Try different chunking strategies to see what happens to the output.  Try adding in a prompt.  Most importantly, create a different chain that does NOT have the retriever in it to see what kind of different answers you get when you don't use RAG.  

Hopefully you see that using RAG is the key to minimizing hallucinations and making your GenAI applications as relevant as possible!

**Remember:** If you are using RAG (i.e. you are creating embeddings), you will want to create your embeddings of text.  Depending on what you are creating vectors of, this means you might need to convert your data source (such as tables) into text first!