# RAG Application: Ask Questions from a PDF Document using Large Language Models

Retrieval-Augmented Generation (RAG) is a generative AI framework that combines pre-trained large language models (LLMs) with external data sources. RAG improves the output of LLMs by using fresh data from authoritative knowledge bases and enterprise systems to generate more reliable responses.

For example, this project is about using RAG to ask questions from a PDF document. The RAG system uses its large language model to understand the question, then it retrieves relevant information from the PDF document, and finally generates a response. This way, we can extract precise information from a document.

## 0. Setup Ollama

I used [Ollama](https://ollama.com) because it's the easiest way to get up and running with large language models, locally on my computer.

In this case, I used [Llama2](https://llama.meta.com/llama2) model by Meta AI as my choice.

On your terminal, run:

```bash
ollama run llama2
```

## 1. Loading Environment Variables and Setting Up the Model

In [1]:
import os
from dotenv import load_dotenv

# If you want to use the OpenAI API, you need to set the OPENAI_API_KEY environment variable
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
MODEL = "gpt-3.5-turbo"

# MODEL = "llama2"

## 2. Prepare Embeddings and Test the Model

In [2]:
from langchain_community.llms import Ollama
from langchain_openai.chat_models import ChatOpenAI
from langchain_community.embeddings import OllamaEmbeddings
from langchain_openai.embeddings import OpenAIEmbeddings

if MODEL.startswith("gpt"):
    model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model=MODEL)
    embeddings = OpenAIEmbeddings()
else:
    model = Ollama(model=MODEL)
    embeddings = OllamaEmbeddings(model=MODEL)

model.invoke("what is machine learning in a few words?")

AIMessage(content='Machine learning is a type of artificial intelligence that allows computers to learn and improve from experience without being explicitly programmed.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 23, 'prompt_tokens': 16, 'total_tokens': 39, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-3.5-turbo-0125', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None}, id='run-f82996e6-1e48-4e38-99c0-6d2e0b1767b6-0', usage_metadata={'input_tokens': 16, 'output_tokens': 23, 'total_tokens': 39, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})

In [3]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser 
chain.invoke("what is machine learning in a few words?")

'Machine learning is a subset of artificial intelligence that enables computers to learn and improve from experience without being explicitly programmed.'

## 3. Load the PDF Document

In [5]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("flyer_rise.pdf")
pages = loader.load_and_split()
pages

[Document(metadata={'source': 'flyer_rise.pdf', 'page': 0}, page_content='Tritons RISE Together\nRISE for your daily dose of Well-Being! Join us for our T ritons RISE T ogether innovative and experiential \nworkshops. Based on research from peak performance psychology, mind/body sciences and the field of positive \npsychology, our RISE Workshops are here to promote you reaching your potential in all areas of your life: \nemotional, physical, and social. Be inspired, learn skills, and build our T riton community! Y ou do not need to make \nan appointment unless the workshop indicates pre-registration — simply show up! Be sure to check out our RISE \nwebsite for additional special events being planned throughout the quarter and any calendar updates:  \nhttps:/ /caps.ucsd.edu/rise\nSunRISE Y oga Session (Recreation and CAPS)\nMondays 10–11am, Week 1 (1/6) – Week 10 (3/10); Except (1/20, 2/17)\nPresenting a CAPS & Recreation partnership yoga class! Join in our SunRISE community for a well-

In [6]:
from langchain.prompts import PromptTemplate

template = """
Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)
print(prompt.format(context="Here is some context", question="Here is a question"))


Answer the question based on the context below. If you can't 
answer the question, reply "I don't know".

Context: Here is some context

Question: Here is a question



## 4. Chain the Prompt, Model, and Parser

In [7]:
chain = prompt | model | parser

In [8]:
chain.input_schema.schema()

/var/folders/b2/wt_vxyr11y5dcgy0ds5gl0yc0000gn/T/ipykernel_24367/3226659032.py:1: PydanticDeprecatedSince20: The `schema` method is deprecated; use `model_json_schema` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.10/migration/
  chain.input_schema.schema()


{'properties': {'context': {'title': 'Context', 'type': 'string'},
  'question': {'title': 'Question', 'type': 'string'}},
 'required': ['context', 'question'],
 'title': 'PromptInput',
 'type': 'object'}

In [9]:
chain.invoke(
    {
        "context": "my specialty is machine learning", 
        "question": "what do you think is my college background?"
    }
)

'Based on your specialty in machine learning, it is likely that your college background is in a related field such as computer science, mathematics, statistics, or engineering.'

## 5.0 Use a Vector Database to Store and Retrieve the Results

In [10]:
from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_documents(pages, embedding=embeddings)



In [11]:
retriever = vectorstore.as_retriever()

In [12]:
from operator import itemgetter

chain = (
    {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question"),
    }
    | prompt
    | model
    | parser
)

In [16]:
questions = [
    "What is Performance Lab?",
    "What is Performance Lab's zoom link?"
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question': question})}")
    print()

Question: What is Performance Lab?
Answer: Performance Lab is an interactive workshop on Tuesdays from 1-2pm, focusing on the "science of success" and strategies to help individuals flourish in the classroom and in life.

Question: What is Performance Lab's zoom link?
Answer: https://uchealth.zoom.us/j/84688908165



## 5.1 Streaming Questions to the Language Model
Basically, what stream does is make the response appear like the style of a chatbot because of a typewriter effect.

In [17]:
for s in chain.stream({"question": "I felt stressed. Which workshop should I go to?"}):
    print(s, end="", flush=True)

You should consider attending the "Stress Better: Skills for Managing Stress" workshop with Melissa Hawthorne-Campos, LCSW on Thursdays from 2-3pm.

## 5.2 Batching Questions to the Language Model

Basically, what batch does is that it allows you to send a batch of questions to the model. This is useful when you have a lot of questions to ask and you don't want to wait for the model to process each question one by one. This is done in parallel.

In [18]:
questions = [
    "I don't sleep well recently. What should I do?",
    #"another question here"
]

In [19]:

chain.batch([{"question": q} for q in questions])

['You can attend the "Sleep Reset" workshop on Thursdays from 3-4pm with Kaitlyn Saulman, Psy.D. to learn research-based tips and techniques for getting the best quantity and quality of sleep.']