# Retrieval And Generation

## What is RAG?

Allows LLM to have context of your propriety documents along with your prompt

In [1]:
import langchain
print(langchain.__version__)

0.3.20


### Load API KEYS

In [2]:
import os 
from dotenv import load_dotenv

load_dotenv()

OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
GROQ_API_KEY = os.getenv("GROQ_API_KEY")

# Setup LangSmith
os.environ["LANGSMITH_TRACING_V2"] = "true"
os.environ["LANGSMITH_ENDPOINT"] = 'https://api.smith.langchain.com'
os.environ["LANGSMITH_PROJECT"] = "RAG_Chatbot"
LANGSMITH_API_KEY = os.getenv("LANGSMITH_API_KEY")


### Sample Response

In [3]:
from langchain_groq import ChatGroq

llm = ChatGroq(model="llama-3.3-70b-versatile")
llm_response = llm.invoke("Tell me a joke")

llm_response

AIMessage(content="Here's one:\n\nWhat do you call a fake noodle?\n\nAn impasta!\n\nHope that made you laugh! Do you want to hear another one?", additional_kwargs={}, response_metadata={'token_usage': {'completion_tokens': 32, 'prompt_tokens': 39, 'total_tokens': 71, 'completion_time': 0.116363636, 'prompt_time': 0.005940527, 'queue_time': 0.05428515, 'total_time': 0.122304163}, 'model_name': 'llama-3.3-70b-versatile', 'system_fingerprint': 'fp_76dc6cf67d', 'finish_reason': 'stop', 'logprobs': None}, id='run-e3461579-16c9-47b6-8e97-f218e8fc34bb-0', usage_metadata={'input_tokens': 39, 'output_tokens': 32, 'total_tokens': 71})

In [4]:
type(llm_response)

langchain_core.messages.ai.AIMessage

In [5]:
llm_response.content

"Here's one:\n\nWhat do you call a fake noodle?\n\nAn impasta!\n\nHope that made you laugh! Do you want to hear another one?"

### Parsing output

Instead of using the AI message content property, we parse the output instead

In [6]:
from langchain_core.output_parsers import StrOutputParser

output_parser = StrOutputParser()

# Send response to parser
output_parser.invoke(llm_response)

"Here's one:\n\nWhat do you call a fake noodle?\n\nAn impasta!\n\nHope that made you laugh! Do you want to hear another one?"

### Chaining 

Helps us combine steps that invole the invoke command together into a single chain

In [7]:
chain = llm | output_parser
chain.invoke("Tell me a joke")

"Here's one:\n\nWhat do you call a fake noodle?\n\nAn impasta!\n\nHope that made you laugh! Do you want to hear another one?"

### Structured Output

However, we prefer a more structured output in real applications than a simple plain text output for futher manipulation


In [8]:
from typing import List
from pydantic import BaseModel, Field

# Response structure expected from the LLMs
class MobileReview(BaseModel):
  phone_model: str = Field(description="Name and model of the phone")
  rating: float = Field(description="Overall rating out of 5")
  pros: List[str] = Field(description="List of positive aspects")
  cons: List[str] = Field(description="List of negative aspects")
  summary: str = Field(description="Brief Summary of review")
  
review_text = """
Just got my hands on the new Galaxy s21 and wow this thing is slick!, The screen is gorgeous, colours pop like crazy. 
Camera's insane too, especially at night- my isnta game has never been better. battery life is solid, lasts me all day with no problems

Not gonna lie though, it is expensive. And what is with the no charger in the box? Come on Samsung. Also still getting used to the new button layout,
I keep hitting the bixby button on accident

Overalll, I would say it is a solid 4 out of 5. Great phone, but a few annoying quirks keep it from being perfect. If you are due for an upgrade,
definetly worth checking out
"""
  
structured_response = llm.with_structured_output(MobileReview)
response = structured_response.invoke(review_text)

response


MobileReview(phone_model='Galaxy s21', rating=4.0, pros=['gorgeous screen', 'excellent camera', 'solid battery life'], cons=['expensive', 'no charger in the box', 'new button layout can be confusing'], summary='A solid phone with a few annoying quirks, great for those due for an upgrade')

### Prompt Template

Allows us to use variables in our prompts to pass in later making them dynamic

In [9]:
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("Tell me a joke about {topic}")
prompt.invoke({"topic": "programming"})

ChatPromptValue(messages=[HumanMessage(content='Tell me a joke about programming', additional_kwargs={}, response_metadata={})])

In [10]:
chain = prompt | llm | output_parser
chain.invoke({"topic": "programming"})

'Why do programmers prefer dark mode?\n\nBecause light attracts bugs.'

### System Prompts

Customises the LLMs behaviour to finetune its output on the prompt

In [11]:
from langchain_core.messages import HumanMessage, SystemMessage

system_message = SystemMessage(content="You are a helpful assistant that tells jokes")
human_message = HumanMessage(content="Tell me about programming")
llm.invoke([system_message, human_message])

AIMessage(content="Programming. Where the only thing more frustrating than the bugs is the fact that you can't squash them with a fly swatter.\n\nBut seriously, programming is the process of writing instructions that a computer can understand, in order to perform specific tasks. It involves designing, writing, testing, and maintaining the source code of computer programs.\n\nThere are many programming languages, each with its own strengths and weaknesses. Some popular ones include:\n\n* Python: Known for its simplicity and versatility, Python is a great language for beginners and experts alike. It's like the Swiss Army knife of programming languages – it can do just about anything.\n* Java: A popular language for building large-scale applications, Java is known for its platform independence and robust security features. It's like the Toyota of programming languages – reliable, but not always the most exciting.\n* JavaScript: Used primarily for web development, JavaScript is a versatile

In [12]:
# Adding system messages to the template
template = ChatPromptTemplate([
  ("system", "You are a helpful assistant that tells jokes"),
  ("human", "Tell me a joke about {topic}")
])

chain = template | llm | output_parser
chain.invoke({"topic": "programming"})

'Why do programmers prefer dark mode?\n\nBecause light attracts bugs.'

### RAG Overview

Two Main Components:


1. Indexing: A pipeline for ingesting data from a source and indexing it. (Usually done offline)

2. Retrieval and Generation: The actual RAG chain, which takes the user query **at run time** and retrives the relevant data from the index, and then passes it to the model

#### Indexing

1. Load: First we load the data using Document Loaders

2. Split: Text splitters break large Documents into smaller chunks. This is useful for both indexing and passing it into a model, as large chunks are harder to search over and wont fit in a model's finite context window

3. Store: Store and index the splits into a vector database or embeddings model


In [13]:
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from typing import List
from langchain_core.documents import Document

# Define text splitter
text_splitter = RecursiveCharacterTextSplitter(
  chunk_size=1000, # size of each chunk
  chunk_overlap=200, # overlap between chunks to minimise effects of bad split
  length_function=len
)

# 1. Load Documents
pdf_loader = PyPDFLoader(r"docs\M1-L2-Network Layer & Physical Resilience.pdf")
documents = pdf_loader.load()

print(len(documents))

47


In [14]:
documents[46]

Document(metadata={'producer': 'Adobe PDF Services', 'creator': 'Microsoft® PowerPoint® 2019', 'creationdate': '2025-01-15T09:00:47+00:00', 'author': 'junluo@ntu.edu.sg', 'moddate': '2025-01-15T17:01:56+08:00', 'subject': 'CE3005@NTU', 'title': '02. Network Layer and Physical Resilience', 'source': 'docs\\M1-L2-Network Layer & Physical Resilience.pdf', 'total_pages': 47, 'page': 46, 'page_label': '47'}, page_content='Application Layer\n• Application layer interacts with application programs \nand is the highest level of OSI model\n• Application layer contains management functions to \nsupport distributed applications\n• Examples of application layer are applications such \nas file transfer, electronic mail, remote login etc\n47')

In [15]:
print(f"total characters in this document: {len(documents[46].page_content)}")

total characters in this document: 309


In [16]:
# 2. Split the Document into chunks
chunks = text_splitter.split_documents(documents)

print(f"Split document into {len(chunks)} chunks")

Split document into 46 chunks


In [17]:
print(f"total characters in this chunk: {len(chunks[0].page_content)}")

total characters in this chunk: 379


In [18]:
# 3. Embed Documents for Storage

embeddings = OpenAIEmbeddings()

document_embeddings = embeddings.embed_documents([chunk.page_content for chunk in chunks])
print(f"Created embeddings for {len(document_embeddings)} document chunks")

Created embeddings for 46 document chunks


In [19]:
document_embeddings[0]

[0.00861442182213068,
 -0.003327538026496768,
 -0.0010474962182343006,
 -0.005147886928170919,
 0.007656519301235676,
 0.00016118546773213893,
 -0.008125422522425652,
 0.0057172696106135845,
 -0.01883651129901409,
 -0.02628537267446518,
 0.016545584425330162,
 0.02438296563923359,
 -0.01109290961176157,
 0.03381462022662163,
 0.019024072214961052,
 0.0036708421539515257,
 0.015004902146756649,
 -0.013223069719970226,
 0.010161802172660828,
 -0.016826925799250603,
 -0.01694750040769577,
 0.009177105501294136,
 -0.003108158241957426,
 -0.012044113129377365,
 0.013531206175684929,
 0.021971464157104492,
 0.005720619112253189,
 -0.020631739869713783,
 0.007884272374212742,
 -0.005191428121179342,
 0.023538939654827118,
 -0.01827382668852806,
 0.0007619676762260497,
 -0.004863196052610874,
 -0.000986371305771172,
 0.00023675424745306373,
 0.012740769423544407,
 -0.028509313240647316,
 0.022212613373994827,
 -0.018032677471637726,
 0.03228733316063881,
 0.026352358981966972,
 -0.009136913344

### Create Chroma Vector DB

In [20]:
from langchain_chroma import Chroma

vector_store = Chroma.from_documents(collection_name="my_collection", documents=chunks, embedding=embeddings)

#### Perform a Similarity Search in the DB for a Query

In [21]:
query = "What are the benefits of a layered approach"
search_results = vector_store.similarity_search(query, k=2)

print(f"\nTop 2 most relvant chunks for this query: '{query}'\n")
for i, result in enumerate(search_results, 1):
  print(f"Result {i}:")
  print(f"Source: {result.metadata.get('source', 'Unknown')}")
  print(f"Content: {result.page_content}")
  print()  


Top 2 most relvant chunks for this query: 'What are the benefits of a layered approach'

Result 1:
Source: docs\M1-L2-Network Layer & Physical Resilience.pdf
Content: Benefits of Layers
• Simplicity
– Easy to design once layers and their interactions are 
defined clearly
• Flexibility
– Easy to modify and develop networks by separate 
layers modifications
• Incremental Changes
– Easy to add news layers, add new functions to a layer
9

Result 2:
Source: docs\M1-L2-Network Layer & Physical Resilience.pdf
Content: Layered Network Architecture



#### Modifying the Vector Store retrieval so it can be added to the chain

In [22]:
retriever = vector_store.as_retriever(search_kwargs={"k": 2})
retriever.invoke("OSI layer Model")

[Document(id='348c7535-5d3c-4f74-b61c-f2e01ce26c33', metadata={'author': 'junluo@ntu.edu.sg', 'creationdate': '2025-01-15T09:00:47+00:00', 'creator': 'Microsoft® PowerPoint® 2019', 'moddate': '2025-01-15T17:01:56+08:00', 'page': 17, 'page_label': '18', 'producer': 'Adobe PDF Services', 'source': 'docs\\M1-L2-Network Layer & Physical Resilience.pdf', 'subject': 'CE3005@NTU', 'title': '02. Network Layer and Physical Resilience', 'total_pages': 47}, page_content='OSI Model in a Nutshell\n18'),
 Document(id='de320ff0-9bcd-49de-ac0e-40029639701d', metadata={'author': 'junluo@ntu.edu.sg', 'creationdate': '2025-01-15T09:00:47+00:00', 'creator': 'Microsoft® PowerPoint® 2019', 'moddate': '2025-01-15T17:01:56+08:00', 'page': 10, 'page_label': '11', 'producer': 'Adobe PDF Services', 'source': 'docs\\M1-L2-Network Layer & Physical Resilience.pdf', 'subject': 'CE3005@NTU', 'title': '02. Network Layer and Physical Resilience', 'total_pages': 47}, page_content='OSI Reference Model: 7 Layers \n(More o

#### Adding the retriever to the chain

In [23]:
template = """
Answer the question based only on the following context:
{context}

Question: {question}

Answer:
"""

prompt = ChatPromptTemplate.from_template(template)

##### Runnable Pass Through lets us pass the input to the invoke as the question

In [24]:
from langchain.schema.runnable import RunnablePassthrough

rag_chain = ({
  "context" : retriever,
  "question": RunnablePassthrough()
} | prompt)

rag_chain.invoke("Explain the OSI Layer Model")

ChatPromptValue(messages=[HumanMessage(content="\nAnswer the question based only on the following context:\n[Document(id='de320ff0-9bcd-49de-ac0e-40029639701d', metadata={'author': 'junluo@ntu.edu.sg', 'creationdate': '2025-01-15T09:00:47+00:00', 'creator': 'Microsoft® PowerPoint® 2019', 'moddate': '2025-01-15T17:01:56+08:00', 'page': 10, 'page_label': '11', 'producer': 'Adobe PDF Services', 'source': 'docs\\\\M1-L2-Network Layer & Physical Resilience.pdf', 'subject': 'CE3005@NTU', 'title': '02. Network Layer and Physical Resilience', 'total_pages': 47}, page_content='OSI Reference Model: 7 Layers \\n(More on Supplementary Materials)\\n11\\nOSI Model\\nData unit Layer Function\\nHost\\nlayers\\nData\\n7. Application Network process to application\\n6. Presentation\\nData representation, encryption and \\ndecryption, convert machine dependent \\ndata to machine independent data\\n5. Session Inter-host communication, managing \\nsessions between applications\\nSegment 4. Transport End-to

##### Process documents to convert them into strings seperated by newlines

In [25]:
def docs_to_str(docs):
  return "\n\n".join(doc.page_content for doc in documents)

In [26]:
rag_chain = (
  {"context": retriever | docs_to_str, "question": RunnablePassthrough()} | prompt
)

rag_chain.invoke("Explain the OSI model")

ChatPromptValue(messages=[HumanMessage(content='\nAnswer the question based only on the following context:\nPart I Syllabus\n1\nLecture Date Subject\n1 15/01/2025 Introduction\n20/01/2025 Network layer & physical resilience\n2 Data link layer – Flow control\nData link layer – Error control\n3 Local area network – Introduction\n4 Local area network – MAC\nLocal area network – Ethernet\n5 Local area network – WLAN\nMobile Access Networks: From 1G to 5G \n6 Packet switch network – Network paradigm\n\nWhich one is more valuable?\n2\n\nSC2008 Computer Networks\n3\nLecture 2\nNetwork Layers and Physical Resilience\n\nContents\n• Layered Network Architecture\n– Motivations and Benefits\n– OSI 7-Layer Model\n– Internet 5-Layer Implementation\n• Physical Resilience\n– Link Failure probability\n– Network resilience calculation\n4\n\nLayered Network Architecture\n\nMotivations for Layered Network \nArchitecture\n• Networks are complex with many pieces\n– Hosts, routers, links, applications, proto

In [27]:
# Complete RAG Chain with LLM Output
RAG_chain = (
   {"context": retriever | docs_to_str, "question": RunnablePassthrough()} | prompt | llm | output_parser
)

question = "Explain the OSI Model"
response = RAG_chain.invoke(question)
print(response)

The OSI (Open Systems Interconnection) Model is a 7-layered framework that describes how data is transmitted over a network. The 7 layers are:

1. Physical Layer: Defines the physical means of transmitting data between devices.
2. Data Link Layer: Provides reliable data transfer between two devices on the same network.
3. Network Layer: Routes data between devices on different networks.
4. Transport Layer: Ensures reliable data transfer between devices.
5. Session Layer: Establishes, manages, and terminates connections between applications.
6. Presentation Layer: Converts data into a format that can be understood by the receiving device.
7. Application Layer: Supports functions such as email, file transfer, and remote login.

Each layer communicates with its peer layer on other devices through a well-defined interface, allowing data to be transmitted reliably and efficiently across a network. The OSI Model helps network designers and engineers to understand and implement network protoc

# Conversational RAG

## Handle follow up questions

In [28]:
from langchain_core.messages import HumanMessage, AIMessage

chat_history = []

chat_history.extend([
  HumanMessage(content=question),
  AIMessage(content=response)
])

In [29]:
chat_history

[HumanMessage(content='Explain the OSI Model', additional_kwargs={}, response_metadata={}),
 AIMessage(content='The OSI (Open Systems Interconnection) Model is a 7-layered framework that describes how data is transmitted over a network. The 7 layers are:\n\n1. Physical Layer: Defines the physical means of transmitting data between devices.\n2. Data Link Layer: Provides reliable data transfer between two devices on the same network.\n3. Network Layer: Routes data between devices on different networks.\n4. Transport Layer: Ensures reliable data transfer between devices.\n5. Session Layer: Establishes, manages, and terminates connections between applications.\n6. Presentation Layer: Converts data into a format that can be understood by the receiving device.\n7. Application Layer: Supports functions such as email, file transfer, and remote login.\n\nEach layer communicates with its peer layer on other devices through a well-defined interface, allowing data to be transmitted reliably and ef

##### Contextualise the follow up question so that the LLM knows to base the answer on the previous question and not give a generic answer

In [30]:
from langchain_core.prompts import MessagesPlaceholder

contextualize_q_system_prompt = (
  "Given a chat history and the latest user question"
  "which might reference context in the chat history,"
  "formulate a standalone question which can be understood"
  "without the chat history. Do not answer the question,"
  "just reformulate it if needed and otherwise return it as is"
)

contextualize_q_prompt = ChatPromptTemplate(
  [
    ("system", contextualize_q_system_prompt),
    MessagesPlaceholder("chat_history"),
    ("human", "{input}"),
  ]
)

contextualize_q_chain = contextualize_q_prompt | llm | output_parser
contextualize_q_chain.invoke({"input": "What is the first layer?", "chat_history": chat_history})

'What is the function and characteristics of the first layer of the OSI Model, which is responsible for transmitting raw bits over a physical medium?'

##### Retrieving documents related to our chat history

In [31]:
from langchain.chains import create_history_aware_retriever

history_aware_retriever = create_history_aware_retriever(
  llm, retriever, contextualize_q_prompt
)

history_aware_retriever.invoke({"input": "What is the first layer?", "chat_history": chat_history})

[Document(id='de320ff0-9bcd-49de-ac0e-40029639701d', metadata={'author': 'junluo@ntu.edu.sg', 'creationdate': '2025-01-15T09:00:47+00:00', 'creator': 'Microsoft® PowerPoint® 2019', 'moddate': '2025-01-15T17:01:56+08:00', 'page': 10, 'page_label': '11', 'producer': 'Adobe PDF Services', 'source': 'docs\\M1-L2-Network Layer & Physical Resilience.pdf', 'subject': 'CE3005@NTU', 'title': '02. Network Layer and Physical Resilience', 'total_pages': 47}, page_content='OSI Reference Model: 7 Layers \n(More on Supplementary Materials)\n11\nOSI Model\nData unit Layer Function\nHost\nlayers\nData\n7. Application Network process to application\n6. Presentation\nData representation, encryption and \ndecryption, convert machine dependent \ndata to machine independent data\n5. Session Inter-host communication, managing \nsessions between applications\nSegment 4. Transport End-to-end connections, reliability \nand flow control\nMedia\nlayers\nPacket 3. Network Path determination and logical \naddressin

#### Final Chain

User query → Retriever → Documents → create_stuff_documents_chain → Formats prompt with {context} filled → LLM → Response

In [38]:
from langchain.chains import create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

question_answer_prompt = ChatPromptTemplate.from_messages([
  ("system", "You are an helpful AI assitant tasked to answer the user's questions"),
  ("system", "Context: {context}"),
  MessagesPlaceholder(variable_name="chat_history"),
  ("human", "{input}")
  
])

question_answer_chain = create_stuff_documents_chain(llm, question_answer_prompt)
rag_chain = create_retrieval_chain(history_aware_retriever, question_answer_chain)

In [39]:
rag_chain.invoke({"input": "What is the first layer?", "chat_history": chat_history})

{'input': 'What is the first layer?',
 'chat_history': [{'role': 'human',
   'content': 'What is the page number of the page that talks about the benefits of a layered approach?'},
  {'role': 'ai',
   'content': 'The page number is not explicitly stated, but based on the context provided, it appears that the benefits of layers are discussed on page 9, as indicated by the line "9" at the end of the benefits section.'}],
 'context': [Document(id='6bf9afd5-8882-47d3-b820-b10e4448da01', metadata={'author': 'junluo@ntu.edu.sg', 'creationdate': '2025-01-15T09:00:47+00:00', 'creator': 'Microsoft® PowerPoint® 2019', 'moddate': '2025-01-15T17:01:56+08:00', 'page': 4, 'page_label': '5', 'producer': 'Adobe PDF Services', 'source': 'docs\\M1-L2-Network Layer & Physical Resilience.pdf', 'subject': 'CE3005@NTU', 'title': '02. Network Layer and Physical Resilience', 'total_pages': 47}, page_content='Layered Network Architecture'),
  Document(id='e361b517-b6c5-405c-bba8-effb7111b7dd', metadata={'autho

### Multi User Chat Bot

##### Test out the firebase db

In [34]:
import uuid
from database import get_chat_history

session_id = str(uuid.uuid4())
question = "What is the page number of the page that talks about the benefits of a layered approach?"
chat_history = get_chat_history(session_id=session_id)
answer = rag_chain.invoke({"input": question, "chat_history": chat_history})['answer']
print(f"human: {question}")
print(f"AI: {answer}")

  return query.where(field_path, op_string, value)


human: What is the page number of the page that talks about the benefits of a layered approach?
AI: The page number is not explicitly stated, but based on the context provided, it appears that the benefits of layers are discussed on page 9, as indicated by the line "9" at the end of the benefits section.


In [35]:
# Add chat history to database
from database import insert_chat_logs

insert_chat_logs(
  session_id=session_id,
  user_query=question,
  llm_response=answer,
  model="llama-3.3-70b-versatile"
)

{'session_id': '907babd3-0f56-4af9-ad0e-504afc356707', 'user_query': 'What is the page number of the page that talks about the benefits of a layered approach?', 'llm_response': 'The page number is not explicitly stated, but based on the context provided, it appears that the benefits of layers are discussed on page 9, as indicated by the line "9" at the end of the benefits section.', 'model': 'llama-3.3-70b-versatile', 'created_at': Sentinel: Value used to set a document field to the server timestamp.}
<class 'dict'>


In [36]:
question2 = "What was the first page number provided?"
chat_history = get_chat_history(session_id)
answer2 = rag_chain.invoke({"input": question, "chat_history": chat_history})["answer"]
print(f"human: {question2}")
print(f"AI: {answer2}")

  return query.where(field_path, op_string, value)


human: What was the first page number provided?
AI: The benefits of layers are discussed on page 9.


In [37]:
insert_chat_logs(
  session_id=session_id,
  user_query=question2,
  llm_response=answer2,
  model="llama-3.3-70b-versatile"
)

{'session_id': '907babd3-0f56-4af9-ad0e-504afc356707', 'user_query': 'What was the first page number provided?', 'llm_response': 'The benefits of layers are discussed on page 9.', 'model': 'llama-3.3-70b-versatile', 'created_at': Sentinel: Value used to set a document field to the server timestamp.}
<class 'dict'>
