# RAG Application: Ask Questions from a PDF Document using Large Language Models

Retrieval-Augmented Generation (RAG) is a generative AI framework that combines pre-trained large language models (LLMs) with external data sources. RAG improves the output of LLMs by using fresh data from authoritative knowledge bases and enterprise systems to generate more reliable responses.

For example, this project is about using RAG to ask questions from a PDF document. The RAG system uses its large language model to understand the question, then it retrieves relevant information from the PDF document, and finally generates a response. This way, we can extract precise information from a document.

## 0. Setup Ollama

I used [Ollama](https://ollama.com) because it's the easiest way to get up and running with large language models, locally on my computer.

In this case, I used [Llama2](https://llama.meta.com/llama2) model by Meta AI as my choice.

On your terminal, run:

```bash
ollama run llama2
```

## 1. Loading Environment Variables and Setting Up the Model

In [1]:
import os
from dotenv import load_dotenv

# If you want to use the OpenAI API, you need to set the OPENAI_API_KEY environment variable
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
#MODEL = "gpt-3.5-turbo"

MODEL = "llama2"

## 2. Prepare Embeddings and Test the Model

In [2]:
from langchain_ollama import OllamaLLM, OllamaEmbeddings

# Initialize Ollama model and embeddings
model = OllamaLLM(model=MODEL,  base_url="http://localhost:11435")
embeddings = OllamaEmbeddings(model=MODEL, base_url="http://localhost:11435")

# Invoke the model
response = model.invoke("What is machine learning in a few words?")
print(response)



Machine learning is a subfield of artificial intelligence that involves training computer systems to learn and improve their performance on a task without being explicitly programmed.


In [3]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser 
chain.invoke("what is machine learning in a few words?")

'\nMachine learning is a subfield of artificial intelligence (AI) that involves the use of algorithms and statistical models to enable machines to learn from data, make decisions, and improve their performance on a specific task over time.'

## 3. Load the PDF Document

In [4]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(r"D:\Projects\MainEL\LLMcode\car_recored.pdf")
pages = loader.load_and_split()
pages

[Document(metadata={'source': 'D:\\Projects\\MainEL\\LLMcode\\car_recored.pdf', 'page': 0, 'page_label': '1'}, page_content='$)&730-&5\x014"*-\x01%*&4&-'),
 Document(metadata={'source': 'D:\\Projects\\MainEL\\LLMcode\\car_recored.pdf', 'page': 2, 'page_label': '3'}, page_content='Dear Customer,\nWelcome to the Chevrolet family. We wish to thank you for choosing Chevrolet Sail.\nIt is our constant endeavor to provide you with products that offer excellent performance through out their ownership period. Which is \nwhy, in addition to offering great cars, we have also set up an extensive, and very well equipped network of retailers and auth orized \nservice centers across the country. \nNaturally, these Chevrolet retailer know everything about your car and provides you with the best service possible. In fact, every retailer \nis equipped with the most advanced technology, technicians specially trained by us and genuine spares. Needless to say, they are also \ncommitted to ensure your comp

In [5]:
from langchain.prompts import PromptTemplate

template = """
Give me the correct answer based on the context below. If it is not based on the context, reply "I don't know".

Context: {context}

Question: {question}
"""

prompt = PromptTemplate.from_template(template)
print(prompt.format(context="Here is some context", question="Here is a question"))

#CNC-reverse it later


Give me the correct answer based on the context below. If it is not based on the context, reply "I don't know".

Context: Here is some context

Question: Here is a question



## 4. Chain the Prompt, Model, and Parser

In [6]:
chain = prompt | model | parser

In [7]:
chain.input_schema.model_json_schema()

{'properties': {'context': {'title': 'Context', 'type': 'string'},
  'question': {'title': 'Question', 'type': 'string'}},
 'required': ['context', 'question'],
 'title': 'PromptInput',
 'type': 'object'}

In [8]:
chain.invoke(
    {
        "context": "my specialty is machine learning", 
        "question": "what do you think is my college background?"
    }
)

' Based on the context you provided, I would assume that your college background is in computer science or a related field such as data science or statistics. Therefore, my answer is: Computer Science or a related field.'

## 5.0 Use a Vector Database to Store and Retrieve the Results

In [None]:
from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_documents(pages, embedding=embeddings)



In [34]:
retriever = vectorstore.as_retriever()

In [35]:
from operator import itemgetter

chain = (
    {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question"),
    }
    | prompt
    | model
    | parser
)

In [36]:
# Ensure pages are loaded
if not pages:
    raise ValueError("No pages were loaded from the PDF.")

# Combine all pages into a single string
context = "\n".join([page.page_content for page in pages])

In [None]:

questions = [
    "Give me the nature of refinery controller environment"
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question': question, 'context': context})}")
    print()


## 5.1 Streaming Questions to the Language Model
Basically, what stream does is make the response appear like the style of a chatbot because of a typewriter effect.

In [None]:
for s in chain.stream({"question": "Intelligent Agents"}):
    print(s, end="", flush=True)

## 5.2 Batching Questions to the Language Model

Basically, what batch does is that it allows you to send a batch of questions to the model. This is useful when you have a lot of questions to ask and you don't want to wait for the model to process each question one by one. This is done in parallel.

In [26]:
questions = [
    "tell me the challenges that OpenAI faced",
    #"another question here"
]

In [27]:

chain.batch([{"question": q} for q in questions])