# Building a RAG application from scratch using `LangChain` and `Bedrock`

Before we get started, I would like to thank to [Santiago](https://www.youtube.com/@underfitted) for his wonderful tutorial on `LangChain`, this notebook is based on his tutorial. 
Here is a high-level overview of the system we want to build:

<img src='./images/IMG_0354.jpg' width="1200">

## Setup

Create an `.env `file in the project directory using env.example as a reference. Populate the .env file with your Aurora PostgreSQL DB cluster details (this we will need later on):

```
PGVECTOR_DRIVER='psycopg2'
PGVECTOR_USER='<<Username>>'
PGVECTOR_PASSWORD='<<Password>>'
PGVECTOR_HOST='<<Aurora DB cluster host>>'
PGVECTOR_PORT=5432
PGVECTOR_DATABASE='<<DBName>>'
```

## Loading the `env` variables

Let's start by loading the environment variables we need to use.

In [1]:
import os
from dotenv import load_dotenv

load_dotenv()

True

## Setting up the model
Let's define the LLM model that we'll use as part of the workflow.

In [2]:
from langchain_aws import ChatBedrock
from langchain_core.messages import HumanMessage


model = ChatBedrock(model_id="anthropic.claude-3-sonnet-20240229-v1:0", model_kwargs={"temperature": 0.1})

We can test the model by asking a simple question.

In [3]:
messages = [
    HumanMessage(
        content="Who won the ICC Criket World Cup 2019?"
    )
]

model.invoke(messages)

AIMessage(content="The 2019 ICC Cricket World Cup was won by England. It was hosted in England and Wales.\n\nIn the final at Lord's Cricket Ground in London, England defeated New Zealand in a dramatic match that went to a Super Over tie-breaker after the scores were tied at the end of the regulation 50 overs per side.\n\nEngland scored 241/8 in their 50 overs, which New Zealand also scored to tie the match. In the Super Over, both teams scored 15 runs each. However, England was awarded the World Cup on a controversial boundary countback rule, having scored more boundaries (fours and sixes) during the match.\n\nIt was England's first ever Cricket World Cup title. New Zealand were the runners-up for the second consecutive World Cup after 2015. The player of the tournament was Kane Williamson of New Zealand.", additional_kwargs={'usage': {'prompt_tokens': 20, 'completion_tokens': 189, 'total_tokens': 209}, 'stop_reason': 'end_turn', 'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0'}, 

The result from the model is an `AIMessage` instance containing the answer. We can extract this answer by chaining the model with an [output parser](https://python.langchain.com/docs/modules/model_io/output_parsers/).

Here is what chaining the model with an output parser looks like:

<img src='./images/IMG_0355.jpg' width="1200">

For this example, we'll use a simple `StrOutputParser` to extract the answer as a string.

In [4]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()
chain = model | parser

chain.invoke("Who won the ICC Criket World Cup 2019?")

"The 2019 ICC Cricket World Cup was won by England. It was hosted in England and Wales.\n\nIn the final at Lord's Cricket Ground in London, England defeated New Zealand in a dramatic match that went to a Super Over tie-breaker after the scores were tied after the regular 50 overs per side.\n\nEngland scored 241/8 in their 50 overs, which New Zealand also scored to tie the match. In the Super Over, both teams scored 15 runs each. However, England was awarded the World Cup on a controversial boundary countback rule, having scored more boundaries (fours and sixes) during their innings.\n\nIt was England's first ever Cricket World Cup title victory. New Zealand were the runners-up for the second consecutive World Cup after 2015. The player of the tournament was Kane Williamson of New Zealand."

## Introducing prompt templates

We want to provide the model with some context and the question. [Prompt templates](https://python.langchain.com/docs/modules/model_io/prompts/quick_start) are a simple way to define and reuse prompts.

In [5]:
from langchain_core.prompts import ChatPromptTemplate

template = """
Answer the question based on the context below. If you can't answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
response = prompt.format(context="Rashmi is Reyaan's mom", question="Who is Rashmi's son ?")

print(response)

Human: 
Answer the question based on the context below. If you can't answer the question, reply "I don't know".

Context: Rashmi is Reyaan's mom

Question: Who is Rashmi's son ?



We can now chain the prompt with the model and the output parser.

<img src='./images/IMG_0356.jpg' width="1200">

In [6]:
chain = prompt | model | parser

response = chain.invoke({
                            "context": "Rashmi is Reyaan's mom",
                            "question": "Who is Rashmi's son ?"
                        })

print(response)

Based on the given context that Rashmi is Reyaan's mom, the answer to "Who is Rashmi's son?" is Reyaan.


## Combining chains

We can combine different chains to create more complex workflows. For example, let's create a second chain that translates the answer from the first chain into a different language.

Let's start by creating a new prompt template for the translation chain:

In [7]:
translation_prompt = ChatPromptTemplate.from_template(
                                                        "Translate {answer} to {language}"
                                                    )

We can now create a new translation chain that combines the result from the first chain with the translation prompt.

Here is what the new workflow looks like:

<img src='./images/IMG_0357.jpg' width="1200">

In [8]:
from operator import itemgetter

translation_chain = (
                        {"answer": chain, "language": itemgetter("language")} | translation_prompt | model | parser
                    )

translation_chain.invoke(
        {
            "context": "John's brother is named Michael. He also has two sisters, Alice and Viktoria.",
            "question": "How many siblings does John have in total?",
            "language": "Hindi"
        }
)

'दिए गए संदर्भ के आधार पर, जॉन के कुल तीन भाई-बहन हैं: एक भाई माइकल और दो बहनें एलिस और विक्टोरिया।'

## Transcribing the YouTube Video

The context we want to send the model comes from a YouTube video. Let's download the video and transcribe it using Amazon Transcribe

In [9]:
YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=lB_0hR5s41Y&ab_channel=BeerBiceps"
S3_BUCKET = 'ml-dl-demo-data'

In [10]:
from utils import transcribe_video

transcribe_video(s3_bucket_name=S3_BUCKET, youtube_video_url=YOUTUBE_VIDEO)

Transcription file already exists.


Let's read the transcription and display the first few characters to ensure everything works as expected.

In [11]:
import json

with open("transcription.txt", "r") as file:
    transcription = json.loads(file.read())
    transcription = transcription['results']['transcripts'][0]['transcript']

transcription[0:100]

"You're a multibillionaire European founder who's moved to Gandhinagar. Yes. Why did you choose Gujar"

## Using the entire transcription as context

If we try to invoke the chain using the transcription as context, the model will return an error because the context is too long.

Large Language Models support limitted context sizes. The video we are using is too long for the model to handle, so we need to find a different solution.

In [12]:
len(transcription)

114926

In [13]:
chain.invoke({"context": transcription,
              "question": "What matters when selecting a location for a business in India ?"
            })

'According to Fabian, the most important factor when selecting a location for a business in India is not the location itself, but finding the right person/director to lead the operations there. He says:\n\n"The key to succeed a new company is not the location, the key is the director. This is the most important thing. You can have a good director in a bad location. It will work. You can have a good location, average director, it\'s gonna be painful."\n\nHe explains that when expanding to a new location, they first identify the best person/employee within their company who can take the lead, and then open the office wherever that person is based or prefers to be. The location itself is secondary to having the right leadership in place.\n\nHe gives examples of opening offices in places like Buffalo, New York and Gandhinagar, Gujarat not because those were targeted locations, but because they had the right people to lead operations there. The focus is on finding talented directors/manager

## Splitting the transcription



Since we can't use the entire transcription as the context for the model, a potential solution is to split the transcription into smaller chunks. We can then invoke the model using only the relevant chunks to answer a particular question:

<img src='./images/IMG_0358.jpg' width="1200">

Let's start by loading the transcription in memory:

In [14]:
from langchain_community.document_loaders import TextLoader

loader = TextLoader("transcription.txt")
text_documents = loader.load()

There are many different ways to split a document. For this example, we'll use a simple splitter that splits the document into chunks of a fixed size. Check [Text Splitters](https://python.langchain.com/docs/modules/data_connection/document_transformers/) for more information about different approaches to splitting documents.

For illustration purposes, let's split the transcription into chunks of 100 characters with an overlap of 20 characters and display the first few chunks:

In [15]:
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=100, chunk_overlap=20)
text_splitter.split_documents(text_documents)[:5]

[Document(page_content='{"jobName":"Multi-BillionairesJourneyInIndia-LeadershipCultureAndOpportunityOdooTRS386.mp41717446851', metadata={'source': 'transcription.txt'}),
 Document(page_content='TRS386.mp41717446851","accountId":"507922848584","status":"COMPLETED","results":{"transcripts":[{"tr', metadata={'source': 'transcription.txt'}),
 Document(page_content='{"transcripts":[{"transcript":"You\'re', metadata={'source': 'transcription.txt'}),
 Document(page_content="a multibillionaire European founder who's moved to Gandhinagar. Yes. Why did you choose Gujarat? In", metadata={'source': 'transcription.txt'}),
 Document(page_content='choose Gujarat? In India? We have a ruler to do is we never go to tier one cities. We always go to', metadata={'source': 'transcription.txt'})]

For our specific application, let's use 1000 characters instead:

In [16]:
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
documents = text_splitter.split_documents(text_documents)

## Finding the relevant chunks



Given a particular question, we need to find the relevant chunks from the transcription to send to the model. Here is where the idea of **embeddings** comes into play.

An embedding is a mathematical representation of the semantic meaning of a word, sentence, or document. It's a projection of a concept in a high-dimensional space. Embeddings have a simple characteristic: The projection of related concepts will be close to each other, while concepts with different meanings will lie far away. 

To provide with the most relevant chunks, we can use the embeddings of the question and the chunks of the transcription to compute the similarity between them. We can then select the chunks with the highest similarity to the question and use them as the context for the model:

<img src='./images/IMG_0359.jpg' width="1200">

Let's generate embeddings for an arbitrary query:

In [17]:
from langchain_community.embeddings import BedrockEmbeddings

embeddings = BedrockEmbeddings()
embedded_query = embeddings.embed_query("Berlin is in Germany")

print(f"Embedding length: {len(embedded_query)}")
print(embedded_query[:10])

Embedding length: 1536
[1.2890625, 0.4453125, 0.28320312, 0.3984375, 0.050048828, -0.123046875, 0.58984375, -0.0007247925, -0.23535156, 0.48046875]


To illustrate how embeddings work, let's first generate the embeddings for two different sentences:

In [18]:
sentence1 = embeddings.embed_query("Welcome to Frankfurt")
sentence2 = embeddings.embed_query("This is a table")

We can now compute the similarity between the query and each of the two sentences. The closer the embeddings are, the more similar the sentences will be.

We can use [Cosine Similarity](https://en.wikipedia.org/wiki/Cosine_similarity) to calculate the similarity between the query and each of the sentences:

In [19]:
from sklearn.metrics.pairwise import cosine_similarity

query_sentence1_similarity = cosine_similarity([embedded_query], [sentence1])[0][0]
query_sentence2_similarity = cosine_similarity([embedded_query], [sentence2])[0][0]

query_sentence1_similarity, query_sentence2_similarity

(0.6138958023127881, 0.2699050016319834)

## Setting up a Vector Store

We need an efficient way to store document chunks, their embeddings, and perform similarity searches at scale. To do this, we'll use a **vector store**.

A vector store is a database of embeddings that specializes in fast similarity searches. 

<img src='./images/IMG_0360.jpg' width="1200">

To understand how a vector store works, let's create one in memory and add a few embeddings to it:

### Storing vectors in Amazon Aurora using `pgvector`

<div style="background-color: #f0f8ff; padding: 10px; border-radius: 5px; font-size: 1.1em;">
<b>Prerequisite:</b>
<ol>
    <li>Have an <b>Aurora cluster ready</b>.</li>
    <li>Create the <b>pgvector extension</b> on your Aurora PostgreSQL database (DB) cluster:
        <pre style="font-size: 1.1em;"><code>
        CREATE EXTENSION vector;
        </code></pre>
    </li>
</ol>
</div>


We can connect to the Aurora cluster and check 


```sql
-- SHOW the current database
SELECT current_database();

-- SHOW all the tables in the database
SELECT table_name
FROM postgres.information_schema.tables
WHERE table_schema = 'public';
```

In [20]:
from langchain_community.vectorstores.pgvector import PGVector, DistanceStrategy

# Loading all env variables 
load_dotenv()

COLLECTION_NAME = 'rag-intro-on-aws'

# Connection String
CONNECTION_STRING = PGVector.connection_string_from_db_params(driver = os.getenv("PGVECTOR_DRIVER"),
                                                              user = os.getenv("PGVECTOR_USER"),                                      
                                                              password = os.getenv("PGVECTOR_PASSWORD"),                                  
                                                              host = os.getenv("PGVECTOR_HOST"),                                            
                                                              port = os.getenv("PGVECTOR_PORT"),                                          
                                                              database = os.getenv("PGVECTOR_DATABASE"),
                                                              )  

# Text Embedding model
embeddings = BedrockEmbeddings()

# Creating the VectorDB store instance   
vectorstore1 = PGVector(collection_name=COLLECTION_NAME,
                           connection_string=CONNECTION_STRING,
                           embedding_function=embeddings,
                           distance_strategy = DistanceStrategy.EUCLIDEAN,
                           use_jsonb = True
                          )

  warn_deprecated(


In [21]:
vectorstore1.add_texts([
                    "Color of the bird is red"
                    "The cat slept by the fire.",
                    "We went to the park after school.",
                    "I finished my homework early.",
                    "The bird sang a beautiful song.",
                    "She read a book before bed.",
                    "Mary has two siblings",
                    "Song was in Spanish", 
                    ])

['a925f488-3d46-4aea-bee4-d23eb8788d3c',
 'f884aa20-5720-4e4b-a51f-e0d79a788e78',
 '84807713-090e-44e3-9fee-923453cac108',
 '049d3b43-9c00-4127-bd4f-a83f73351461',
 '21651cd9-9dac-45fe-ac08-4e08943a05cb',
 '5e279290-6c6c-473d-a676-3fdd045a46a7',
 '5c44bcdf-6fdb-4598-b508-4260acd5fdfc']

We can now query the vector store to find the most similar embeddings to a given query:

In [22]:
vectorstore1.similarity_search_with_score(query="What the bird was singing", k=3)

[(Document(page_content='The bird sang a beautiful song.'),
  12.719913663751074),
 (Document(page_content='The bird sang a beautiful song.'),
  12.719913663751074),
 (Document(page_content='The bird sang a beautiful song.'),
  12.719913663751074)]

## Connecting the vector store to the chain

We can use the vector store to find the most relevant chunks from the transcription to send to the model. Here is how we can connect the vector store to the chain:

<img src='./images/IMG_0361.jpg' width="1200">

We need to configure a [Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/). The retriever will run a similarity search in the vector store and return the most similar documents back to the next step in the chain.

We can get a retriever directly from the vector store we created before: 

In [23]:
retriever1 = vectorstore1.as_retriever()
retriever1.invoke("Whats the color of the bird who was singing?")

[Document(page_content='Color of the bird is redThe cat slept by the fire.'),
 Document(page_content='Color of the bird is redThe cat slept by the fire.'),
 Document(page_content='Color of the bird is redThe cat slept by the fire.'),
 Document(page_content='The bird sang a beautiful song.')]

Our prompt expects two parameters, "context" and "question." We can use the retriever to find the chunks we'll use as the context to answer the question.

We can create a map with the two inputs by using the [`RunnableParallel`](https://python.langchain.com/docs/expression_language/how_to/map) and [`RunnablePassthrough`](https://python.langchain.com/docs/expression_language/how_to/passthrough) classes. This will allow us to pass the context and question to the prompt as a map with the keys "context" and "question."

In [24]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

setup = RunnableParallel(context=retriever1, question=RunnablePassthrough())
setup.invoke("Whats the color of the bird who was singing?")

{'context': [Document(page_content='Color of the bird is redThe cat slept by the fire.'),
  Document(page_content='Color of the bird is redThe cat slept by the fire.'),
  Document(page_content='Color of the bird is redThe cat slept by the fire.'),
  Document(page_content='The bird sang a beautiful song.')],
 'question': 'Whats the color of the bird who was singing?'}

Let's now add the setup map to the chain and run it:



In [25]:
chain = setup | prompt | model | parser
chain.invoke("Whats the color of the bird who was singing?")

'Based on the given context, the color of the bird is red. This is mentioned in the first three documents which state "Color of the bird is red".'

Let's invoke the chain using another example:

In [26]:
chain.invoke("Does Mary has any brother or sister ?")

'Based on the given context, Mary has two siblings. So the answer is yes, Mary has at least one brother or sister.'

In [27]:
## Loading transcription into the vector store


We initialized the vector store with a few random strings. Let's create a new vector store using the chunks from the video transcription.

## Setting up Aurora

In [28]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import BedrockEmbeddings
from langchain_community.vectorstores.pgvector import PGVector
import os
from dotenv import load_dotenv

# Loading all env variables
load_dotenv()

# Load the text from the file
loader = TextLoader("transcription.txt")
documents = loader.load()

# Split the text into smaller chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=10000, chunk_overlap=200)
docs = text_splitter.split_documents(documents)

# Initialize the embeddings
embeddings = BedrockEmbeddings()

# Set the collection name
COLLECTION_NAME = "rag-intro-yt"

# Connection String
CONNECTION_STRING = PGVector.connection_string_from_db_params(
    driver=os.getenv("PGVECTOR_DRIVER"),
    user=os.getenv("PGVECTOR_USER"),
    password=os.getenv("PGVECTOR_PASSWORD"),
    host=os.getenv("PGVECTOR_HOST"),
    port=os.getenv("PGVECTOR_PORT"),
    database=os.getenv("PGVECTOR_DATABASE"),
)

# Create the PGVector instance from the documents
db = PGVector.from_documents(
                                embedding=embeddings,
                                documents=docs,
                                collection_name=COLLECTION_NAME,
                                connection_string=CONNECTION_STRING,
                                use_jsonb = True
                            )

Let's now run a similarity search on Aurora to make sure everything works:

In [29]:
db.similarity_search("Can you share the detail of the speaker's journey from starting as a coder to becoming a successful entrepreneur, including the pivot in his business model?")[:3]

[Document(page_content="that. You like how you're working, right? The reason is that um uh public companies tend to refocus on the short term, you know, you have to uh publish earning codes with the, the sales number and if the numbers are good, everyone is happy, they buy the shares. If numbers are bad, people are not happy and your employees are frustrated because they have shares and it's bad. So public companies have a tendency to focus on the short term to saves, saves of the moments of the quarter and so on. I don't want that the success of FU is always to build for the long term. And I don't want to see on to look for the short term or the sales number of the quarter. One, the other thing is I'm so much focused on productivity and efficiency. Uh When you get public, you need extra layers of reporting transparency, uh anything like that. Uh So second reason I don't want that I want to be super efficient, decide right away instead of asking the board of directors. Um And third, I 

Let's setup the new chain using Aurora as the vector store:

In [30]:
chain = (
    {"context": db.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)

response = chain.invoke("What are the main challenges and advantages of doing business in India, including insights on market sensitivity, price, and speed of decision-making?")

In [31]:
print(response)

Based on the context provided, here are some of the main challenges and advantages of doing business in India mentioned:

Challenges:
1. The Indian market is very price sensitive. Indian customers focus a lot on getting the cheapest option, sometimes at the expense of quality or efficient service.
2. Many Indian companies try to do everything themselves initially instead of buying services, unlike in the US where companies readily buy services.
3. There is a mindset of saving money by using more human labor instead of automating processes, even if automation could be more efficient.
4. Marketing to Indian businesses has been a challenge since the company was previously known more among developers than businessmen.

Advantages: 
1. India is a fast-deciding market compared to Europe/Africa, with sales cycles being shorter for small companies.
2. Indians want to move with speed and go for the cheapest option, which can be advantageous for affordable products/services.
3. India has a big g

## Resources 

- [Building a RAG application from scratch using Python - By Santiago](https://www.youtube.com/watch?v=BrsocJb-fAo&t=548s&ab_channel=Underfitted)
- [Vector Embeddings and RAG Demystified: Leveraging Amazon Bedrock, Aurora, and LangChain - Part 1](https://community.aws/content/2gvh6fQM4mJQduLye3mHlCNvPxX/vector-embeddings-and-rag-demystified)
- [Vector Embeddings and RAG Demystified: Leveraging Amazon Bedrock, Aurora, and LangChain - Part 2](https://community.aws/content/2gvh8oJzNrM4vxdZDd903zcEFJc/vector-embeddings-and-rag-demystified-2)