# Building a RAG application from scratch using `LangChain` and `Bedrock`

Before we get started, I would like to thank to [Santiago](https://www.youtube.com/@underfitted) for his wonderful tutorial on `LangChain`, this notebook is based on his tutorial. 
Here is a high-level overview of the system we want to build:

<img src='./images/IMG_0354.jpg' width="1200">

## Setup

Create an `.env `file in the project directory using env.example as a reference. Populate the .env file with your Aurora PostgreSQL DB cluster details (this we will need later on):

```
PGVECTOR_DRIVER='psycopg2'
PGVECTOR_USER='<<Username>>'
PGVECTOR_PASSWORD='<<Password>>'
PGVECTOR_HOST='<<Aurora DB cluster host>>'
PGVECTOR_PORT=5432
PGVECTOR_DATABASE='<<DBName>>'
```

## Loading the `env` variables

Let's start by loading the environment variables we need to use.

In [1]:
import os
from dotenv import load_dotenv
from pprint import pprint as pp
import boto3
import json

load_dotenv()

True

In [2]:
import langchain
langchain.__version__

'0.2.1'

## How to access any LLM/FM from `Bedrock` ? 

In [3]:
#Control plane 
bedrock = boto3.client(
    service_name='bedrock',
    region_name='us-west-2', 
)

#Create the connection to Bedrock
bedrock_runtime = boto3.client(
    service_name='bedrock-runtime',
    region_name='us-west-2', 
    
)

In [4]:
presently_supported_models = [model['modelId'] for model in bedrock.list_foundation_models()['modelSummaries']]
pp(presently_supported_models)

['amazon.titan-tg1-large',
 'amazon.titan-embed-g1-text-02',
 'amazon.titan-text-lite-v1:0:4k',
 'amazon.titan-text-lite-v1',
 'amazon.titan-text-express-v1:0:8k',
 'amazon.titan-text-express-v1',
 'amazon.titan-text-agile-v1',
 'amazon.titan-embed-text-v1:2:8k',
 'amazon.titan-embed-text-v1',
 'amazon.titan-embed-text-v2:0:8k',
 'amazon.titan-embed-text-v2:0',
 'amazon.titan-embed-image-v1:0',
 'amazon.titan-embed-image-v1',
 'amazon.titan-image-generator-v1:0',
 'amazon.titan-image-generator-v1',
 'stability.stable-diffusion-xl-v1:0',
 'stability.stable-diffusion-xl-v1',
 'ai21.j2-grande-instruct',
 'ai21.j2-jumbo-instruct',
 'ai21.j2-mid',
 'ai21.j2-mid-v1',
 'ai21.j2-ultra',
 'ai21.j2-ultra-v1:0:8k',
 'ai21.j2-ultra-v1',
 'anthropic.claude-instant-v1:2:100k',
 'anthropic.claude-instant-v1',
 'anthropic.claude-v2:0:18k',
 'anthropic.claude-v2:0:100k',
 'anthropic.claude-v2:1:18k',
 'anthropic.claude-v2:1:200k',
 'anthropic.claude-v2:1',
 'anthropic.claude-v2',
 'anthropic.claude-3-s

In [5]:
prompt_data = """Explain batch normalization and why its useful?"""

body = {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 1024,
            "messages": [
                 {
                    "role": "user",
                    "content": [
                        {
                            "type": "text",
                            "text": prompt_data
                        }
                    ]
                }
            ],
        }

body = json.dumps(body) # Encode body as JSON string

model_id = "anthropic.claude-3-sonnet-20240229-v1:0"
accept = 'application/json'
contentType = 'application/json'

In [6]:
response = bedrock_runtime.invoke_model(body=body,
                                        modelId=model_id, 
                                        accept=accept, 
                                        contentType=contentType)

response_body = json.loads(response.get("body").read())

for output in response_body.get("content", []):
    print(output["text"])

Batch normalization is a technique used in deep neural networks to improve the stability and performance of the training process. It was introduced by Sergey Ioffe and Christian Szegedy in 2015. Here's an explanation of batch normalization and why it's useful:

1. **Internal Covariate Shift**: During the training of deep neural networks, the distribution of the inputs to a layer can change as the parameters of the previous layers are updated. This phenomenon is known as "internal covariate shift." It can slow down the training process and make it difficult for the network to converge.

2. **Normalization of Inputs**: Batch normalization addresses this issue by normalizing the inputs to each layer. It does this by subtracting the batch mean and dividing by the batch standard deviation. This ensures that the inputs to each layer have a mean of zero and a standard deviation of one.

3. **Improved Gradient Flow**: By normalizing the inputs, batch normalization allows for better gradient fl

## Using `LangChain`
Let's define the LLM model that we'll use as part of the workflow.

In [7]:
from langchain_aws import ChatBedrock
from langchain_core.messages import HumanMessage


model = ChatBedrock(model_id="anthropic.claude-3-sonnet-20240229-v1:0", model_kwargs={"temperature": 0.1})

We can test the model by asking a simple question.

In [8]:
prompt_data = """Who won the ICC Criket World Cup 2019?"""
model.invoke(prompt_data)

AIMessage(content="The 2019 ICC Cricket World Cup was won by England. It was hosted in England and Wales.\n\nIn the final at Lord's Cricket Ground in London, England defeated New Zealand in a dramatic match that went to a Super Over tie-breaker after the scores were tied at the end of the regulation 50 overs per side.\n\nEngland scored 241/8 in their 50 overs, which New Zealand also scored to tie the match. In the Super Over, both teams scored 15 runs each. However, England was awarded the World Cup title on the basis of having scored more boundaries (fours and sixes) during the match.\n\nIt was England's first ever Cricket World Cup title victory. New Zealand finished as the runners-up for the second consecutive World Cup after 2015.", additional_kwargs={'usage': {'prompt_tokens': 20, 'completion_tokens': 174, 'total_tokens': 194}, 'stop_reason': 'end_turn', 'model_id': 'anthropic.claude-3-sonnet-20240229-v1:0'}, response_metadata={'usage': {'prompt_tokens': 20, 'completion_tokens': 1

In [10]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()
chain = model | parser

print(chain.invoke("Who won the ICC Criket World Cup 2019?"))

The 2019 ICC Cricket World Cup was won by England. It was hosted in England and Wales.

In the final at Lord's Cricket Ground in London, England defeated New Zealand in a dramatic match that went to a Super Over tie-breaker after the scores were tied at the end of the regulation 50 overs per side.

England scored 241/8 in their 50 overs, which New Zealand also scored to tie the match. In the Super Over, both teams scored 15 runs each. However, England was awarded the World Cup title on the basis of having scored more boundaries (fours and sixes) during the match.

It was England's first ever Cricket World Cup title victory. New Zealand finished as the runners-up for the second consecutive World Cup after 2015.


## Transcribing the YouTube Video

The context we want to send the model comes from a YouTube video. Let's download the video and transcribe it using Amazon Transcribe

<img src='./images/IMG_0354.jpg' width="1200">

In [11]:
YOUTUBE_VIDEO = "https://www.youtube.com/watch?v=lB_0hR5s41Y&ab_channel=BeerBiceps"
S3_BUCKET = 'ml-dl-demo-data'

from utils import transcribe_video

transcribe_video(s3_bucket_name=S3_BUCKET, youtube_video_url=YOUTUBE_VIDEO)

Transcription file already exists.


Let's read the transcription and display the first few characters to ensure everything works as expected.

In [12]:
import json

with open("transcription.txt", "r") as file:
    transcription = json.loads(file.read())
    transcription = transcription['results']['transcripts'][0]['transcript']

transcription[0:100]

"You're a multibillionaire European founder who's moved to Gandhinagar. Yes. Why did you choose Gujar"

In [13]:
len(transcription)

114926

In [14]:
from langchain_core.prompts import ChatPromptTemplate

template = """
Answer the question based on the context below. If you can't answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = ChatPromptTemplate.from_template(template)
chain = prompt | model | parser

In [17]:
# Sending the entire transcript --> SLOW <not recommended>
response = chain.invoke({"context": transcription,
                         "question": "What matters when selecting a location for a business in India ?"
                        })

print(response)

According to Fabian, the most important factor when selecting a location for a business in India is not the location itself, but finding the right person/leader to head the operations there. He says:

"The key to succeed a new company is not the location, the key is the director. This is the most important thing. You can have a good director in a bad location, it will work. You can have a good location, average director, it's gonna be painful."

He explains that when expanding to a new city/region in India, they first identify a strong leader/director from within the company who can lead the new operations. The location is secondary - they go wherever that capable person is based or willing to relocate to. 

For example, he chose Gandhinagar not because of the city itself, but because the director he wanted to work with happened to be based there initially. The focus is on finding the right leadership talent first, and then establishing the operations in whatever location makes sense f

### Splitting the transcription



Since we can't use the entire transcription as the context for the model, a potential solution is to split the transcription into smaller chunks. We can then invoke the model using only the relevant chunks to answer a particular question:

<img src='./images/IMG_0358.jpg' width="1200">

In [18]:
# 1. Load using TextLoader
from langchain_community.document_loaders import TextLoader

loader = TextLoader("transcription.txt")
text_documents = loader.load()

# 2. Split using RecursiveCharacterTextSplitter
from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
documents = text_splitter.split_documents(text_documents)

### Finding the relevant chunks



Given a particular question, we need to find the relevant chunks from the transcription to send to the model. Here is where the idea of **embeddings** comes into play.

An embedding is a mathematical representation of the semantic meaning of a word, sentence, or document. It's a projection of a concept in a high-dimensional space. Embeddings have a simple characteristic: The projection of related concepts will be close to each other, while concepts with different meanings will lie far away. 

To provide with the most relevant chunks, we can use the embeddings of the question and the chunks of the transcription to compute the similarity between them. We can then select the chunks with the highest similarity to the question and use them as the context for the model:

<img src='./images/IMG_0359.jpg' width="1200">

### Setting up a Vector Store

We need an efficient way to store document chunks, their embeddings, and perform similarity searches at scale. To do this, we'll use a **vector store**.

A vector store is a database of embeddings that specializes in fast similarity searches. 

<img src='./images/IMG_0360.jpg' width="1200">


In [21]:
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.embeddings import BedrockEmbeddings
from langchain_community.vectorstores.pgvector import PGVector
import os

# Initialize the embeddings
embeddings = BedrockEmbeddings()

# Set the collection name
COLLECTION_NAME = "rag-intro-yt"

# Connection String
CONNECTION_STRING = PGVector.connection_string_from_db_params(
                                                                driver=os.getenv("PGVECTOR_DRIVER"),
                                                                user=os.getenv("PGVECTOR_USER"),
                                                                password=os.getenv("PGVECTOR_PASSWORD"),
                                                                host=os.getenv("PGVECTOR_HOST"),
                                                                port=os.getenv("PGVECTOR_PORT"),
                                                                database=os.getenv("PGVECTOR_DATABASE"),
                                                            )

# Create the PGVector instance from the documents
db = PGVector.from_documents(
                                embedding=embeddings,
                                documents=documents,
                                collection_name=COLLECTION_NAME,
                                connection_string=CONNECTION_STRING,
                                use_jsonb = True
                            )

Let's now run a similarity search on Aurora to make sure everything works:

### Similarity Search 

In [22]:
db.similarity_search("Can you share the detail of the speaker's journey from starting as a coder to becoming a successful entrepreneur, including the pivot in his business model?")[:3]

[Document(page_content='to and IIII I love business. So I was reading the 223 books, uh, about management psychology to develop my business every week. Um, I did a lot of things and this one which is to do work better than the others. And so I started alone after two years, I had one employee after three years, 34 or five, this is in 2004. That was in 2002. Yes, when I was still a student and then I grew, uh organically bootstrapped the company. Uh I started to sell services on this software. So the software is, is a management software. At the beginning, I was selling services. So I asked Ken, what do you want? And I was developing everything and I did that until 2 2010. And at that time, I had a software that had a lot of features because I developed everything the customer would asked for. So it was full of feature but ugly complex uh and everything. Uh And at that time, I had uh 100 employees. Um So I decided to do a pivot in the business me and say we cannot do service and develop

We can use the vector store to find the most relevant chunks from the transcription to send to the model. Here is how we can connect the vector store to the chain:

<img src='./images/IMG_0361.jpg' width="1200">

We need to configure a [Retriever](https://python.langchain.com/docs/modules/data_connection/retrievers/). The retriever will run a similarity search in the vector store and return the most similar documents back to the next step in the chain.

We can get a retriever directly from the vector store we created before: 

### Retreval 

Let's setup the new chain using Aurora as the vector store:

In [23]:
from langchain_core.runnables import RunnableParallel, RunnablePassthrough

chain = (
    {"context": db.as_retriever(), "question": RunnablePassthrough()}
    | prompt
    | model
    | parser
)

response = chain.invoke("What are the main challenges and advantages of doing business in India, including insights on market sensitivity, price, and speed of decision-making?")
print(response)

Based on the context provided, here are the main challenges and advantages of doing business in India mentioned:

Advantages:
1. India is an open market and easy to do business in, even easier than the US according to the speaker.
2. Indian customers are willing to use products that may not be perfect, unlike Americans who are very picky.
3. The decision-making process is faster in India compared to Europe or Africa.
4. India has a large pool of developers that can provide services at an affordable cost.

Challenges:
1. The Indian market is very price-sensitive, and customers prioritize affordability over quality.
2. Indian customers often think they can do everything themselves and are reluctant to buy services, preferring to just buy the software.
3. Adapting products and services to meet Indian market needs, such as pricing in rupees, complying with GST and other regulations.
4. Building brand awareness and marketing reach, as the speaker's company was initially known only among dev

## Resources 

- [Building a RAG application from scratch using Python - By Santiago](https://www.youtube.com/watch?v=BrsocJb-fAo&t=548s&ab_channel=Underfitted)
- [Vector Embeddings and RAG Demystified: Leveraging Amazon Bedrock, Aurora, and LangChain - Part 1](https://community.aws/content/2gvh6fQM4mJQduLye3mHlCNvPxX/vector-embeddings-and-rag-demystified)
- [Vector Embeddings and RAG Demystified: Leveraging Amazon Bedrock, Aurora, and LangChain - Part 2](https://community.aws/content/2gvh8oJzNrM4vxdZDd903zcEFJc/vector-embeddings-and-rag-demystified-2)