# A Gentle Introduction to RAG Applications

This notebook creates a simple RAG (Retrieval-Augmented Generation) system to answer questions from a PDF document using an open-source model.


In [1]:
PDF_FILE = "paul.pdf"
MODEL = "llama3.2"

## Loading the PDF document

Let's start by loading the PDF document and breaking it down into separate pages.

<img src='images/documents.png' width="1000">


In [2]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(PDF_FILE)
pages = loader.load()

print(f"Number of pages: {len(pages)}")
print(f"Length of a page: {len(pages[1].page_content)}")
print("Content of a page:", pages[1].page_content)

Number of pages: 9
Length of a page: 3217
Content of a page: 10% a week. And while 110 may not seem much better than 100,if you keep growing at 10% a week you'll be surprised how bigthe numbers get. After a year you'll have 14,000 users, and after2 years you'll have 2 million.You'll be doing different things when you're acquiring users athousand at a time, and growth has to slow down eventually. Butif the market exists you can usually start by recruiting usersmanually and then gradually switch to less manual methods. [3]Airbnb is a classic example of this technique. Marketplaces are sohard to get rolling that you should expect to take heroic measuresat first. In Airbnb's case, these consisted of going door to door inNew York, recruiting new users and helping existing ones improvetheir listings. When I remember the Airbnbs during YC, I picturethem with rolly bags, because when they showed up for tuesdaydinners they'd always just flown back from somewhere.FragileAirbnb now seems like an 

## Splitting the pages in chunks

Pages are too long, so let's split pages into different chunks.

<img src='images/splitter.png' width="1000">


In [3]:
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=1500, chunk_overlap=300)

chunks = splitter.split_documents(pages)
print(f"Number of chunks: {len(chunks)}")
print(f"Length of a chunk: {len(chunks[3].page_content)}")
print("Content of a chunk:", chunks[3].page_content)


Number of chunks: 33
Length of a chunk: 33
Content of a chunk: https://paulgraham.com/ds.html1/9


## Storing the chunks in a vector store

We can now generate embeddings for every chunk and store them in a vector store.

<img src='images/vectorstore.png' width="1000">


In [4]:
# initialise pinecone client and index
from pinecone import Pinecone, ServerlessSpec

import os

os.environ['PINECONE_API_KEY'] = 'faec4084-0024-486f-bcf8-3ca6f74bb688'


pc = Pinecone(api_key=os.environ['PINECONE_API_KEY'])

index_name = "nomadic-llama"


pc.create_index(

    name=index_name,

    dimension=3072, # Replace with your model dimensions

    metric="cosine", # Replace with your model metric

    spec=ServerlessSpec(

        cloud="aws",

        region="us-east-1"

    ) 

)

  from tqdm.autonotebook import tqdm


PineconeApiException: (409)
Reason: Conflict
HTTP response headers: HTTPHeaderDict({'content-type': 'text/plain; charset=utf-8', 'access-control-allow-origin': '*', 'vary': 'origin,access-control-request-method,access-control-request-headers', 'access-control-expose-headers': '*', 'x-pinecone-api-version': '2024-07', 'X-Cloud-Trace-Context': '44e65f137afd793e4ad2d3bfd211ca00', 'Date': 'Sun, 20 Oct 2024 12:59:59 GMT', 'Server': 'Google Frontend', 'Content-Length': '85', 'Via': '1.1 google', 'Alt-Svc': 'h3=":443"; ma=2592000,h3-29=":443"; ma=2592000'})
HTTP response body: {"error":{"code":"ALREADY_EXISTS","message":"Resource  already exists"},"status":409}


In [5]:

index = pc.Index(index_name)  
index.describe_index_stats()  

{'dimension': 3072,
 'index_fullness': 0.0,
 'namespaces': {},
 'total_vector_count': 0}

In [7]:
from langchain_community.vectorstores import FAISS
from langchain_pinecone import PineconeVectorStore
from langchain_community.embeddings import OllamaEmbeddings
import os
from langchain_community.embeddings import HuggingFaceInferenceAPIEmbeddings
# embeddings = OllamaEmbeddings(base_url='http://localhost:11434/',model=MODEL)
embeddings = HuggingFaceInferenceAPIEmbeddings(model_name="meta-llama/Llama-3.2-3B-Instruct", api_key="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")
vectorstore = PineconeVectorStore.from_documents(chunks, embeddings, index_name=index_name)
# vectorstore: PineconeVectorStore = PineconeVectorStore.from_existing_index(index_name=index_name, embedding=embeddings)

# vectorstore = FAISS.from_documents(chunks, embeddings)


ValidationError: 1 validation error for HuggingFaceInferenceAPIEmbeddings
api_key
  Field required [type=missing, input_value={'model_name': 'meta-llama/Llama-3.2-3B-Instruct'}, input_type=dict]
    For further information visit https://errors.pydantic.dev/2.9/v/missing

## Setting up a retriever

We can use a retriever to find chunks in the vector store that are similar to a supplied question.

<img src='images/retriever.png' width="1000">


In [85]:
retriever = vectorstore.as_retriever()
retriever.invoke("Vitamin A and T3")

[Document(metadata={'source': 'Dr.JackKruse-Epi-paleoRx_ThePrescriptionforDiseaseReversalandOptimalHealth(2013,OptimizedLifePLC)-libgen.li.pdf', 'page': 74}, page_content='sex-steroid hormones are big cardiac risk factors. If you have a chronically elevated HS-CRP, you\ngenerally are in deep trouble. This tends to walk hand-in-hand with a low Vitamin D level, too.\nHORMONES AND HEART DISEASE\nThe major hormones are linked to heart disease: low testosterone, low estrogen, low thyroid'),
 Document(metadata={'source': 'Dr.JackKruse-Epi-paleoRx_ThePrescriptionforDiseaseReversalandOptimalHealth(2013,OptimizedLifePLC)-libgen.li.pdf', 'page': 148}, page_content='Table of Contents\nAbout The Author\nDr. Jack Kruse\nIntroduction\nChapter One\nClosing the great divide\nChapter Two\nPrimal Sense: It comes with your biology, so use it.\nChapter Three\nUsing Primal Sense to adapt to change\nChapter Four\nThe fuels of the Epi-paleo Rx and the current policy of truth in healthcare\nChapter Five\nWhat

## Configuring the model

We'll be using Ollama to load the local model in memory. After creating the model, we can invoke it with a question to get the response back.

<img src='images/model.png' width="1000">


In [81]:
from langchain_ollama import ChatOllama

model = ChatOllama(model=MODEL, temperature=0)
model.invoke("Who are you?")

AIMessage(content='I\'m an artificial intelligence model known as Llama. Llama stands for "Large Language Model Meta AI."', response_metadata={'model': 'llama3.1', 'created_at': '2024-09-11T17:49:57.442372Z', 'message': {'role': 'assistant', 'content': ''}, 'done_reason': 'stop', 'done': True, 'total_duration': 3226541125, 'load_duration': 31977000, 'prompt_eval_count': 15, 'prompt_eval_duration': 1337540000, 'eval_count': 23, 'eval_duration': 1855609000}, id='run-a771846d-df44-46b7-906c-b7e0f96e7048-0', usage_metadata={'input_tokens': 15, 'output_tokens': 23, 'total_tokens': 38})

## Parsing the model's response

The response from the model is an `AIMessage` instance containing the answer. We can extract the text answer by using the appropriate output parser. We can connect the model and the parser using a chain.

<img src='images/parser.png' width="1000">


In [57]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser 
print(chain.invoke("Who is the president of the United States?"))

As of my last update in April 2023, Joe Biden is the President of the United States. He took office on January 20, 2021, succeeding Donald Trump as the 46th President of the United States. Please note that this information might change over time due to elections or other political developments.


## Setting up a prompt

In addition to the question we want to ask, we also want to provide the model with the context from the PDF file. We can use a prompt template to define and reuse the prompt we'll use with the model.

<img src='images/prompt.png' width="1000">


In [58]:
from langchain.prompts import PromptTemplate

template = """
You are an assistant that provides answers to questions based on
a given context. 

Answer the question based on the context. If you can't answer the
question, reply "I don't know".

Be as concise as possible and go straight to the point.

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)
print(prompt.format(context="Here is some context", question="Here is a question"))


You are an assistant that provides answers to questions based on
a given context. 

Answer the question based on the context. If you can't answer the
question, reply "I don't know".

Be as concise as possible and go straight to the point.

Context: Here is some context

Question: Here is a question



## Adding the prompt to the chain

We can now chain the prompt with the model and the parser.

<img src='images/chain1.png' width="1000">


In [59]:
chain = prompt | model | parser

chain.invoke({
    "context": "Anna's sister is Susan", 
    "question": "Who is Susan's sister?"
})


'Anna.'

## Adding the retriever to the chain

Finally, we can connect the retriever to the chain to get the context from the vector store.

<img src='images/chain2.png' width="1000">


In [68]:
from operator import itemgetter
from langchain_core.runnables.history import RunnableWithMessageHistory
from langchain_community.chat_message_histories import SQLChatMessageHistory

# !rm memory.db

def get_session_history(session_id):
    return SQLChatMessageHistory(session_id, "sqlite:///memory.db")

chain = (
    {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question"),
    }
    | prompt
    | model
    | parser
)

chain_with_history = RunnableWithMessageHistory(chain, get_session_history, input_messages_key="question")

## Using the chain to answer questions

Finally, we can use the chain to ask questions that will be answered using the PDF document.


In [61]:
questions = [
    "How can founders best set their company up for success?",
    "What's the most common unscalable thing founders have to do at the start?",
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question': question})}")
    print("*************************\n")

Question: How can founders best set their company up for success?
Answer: Start small and focus on user acquisition, manufacturing your own hardware or using software on users' behalf if necessary, and working hard to delight users when you only have a handful of them. This will help you develop habits that benefit the company in the long run.
*************************

Question: What's the most common unscalable thing founders have to do at the start?
Answer: Doing sales themselves initially.
*************************



In [71]:
from langchain_core.messages import HumanMessage

chain_with_history.invoke(
    {"question": "Why?"},
    config={"configurable": {"session_id": "1"}}
)

"I don't know."

In [72]:
get_session_history("1").get_messages()

[HumanMessage(content='What is the most common unscalable thing founders have to do at the start?'),
 AIMessage(content="Manufacture their own hardware or use software on users' behalf."),
 HumanMessage(content='Why?'),
 AIMessage(content="I don't know.")]