# RAG Application: Ask Questions from a PDF Document using Large Language Models

Retrieval-Augmented Generation (RAG) is a generative AI framework that combines pre-trained large language models (LLMs) with external data sources. RAG improves the output of LLMs by using fresh data from authoritative knowledge bases and enterprise systems to generate more reliable responses.

For example, this project is about using RAG to ask questions from a PDF document. The RAG system uses its large language model to understand the question, then it retrieves relevant information from the PDF document, and finally generates a response. This way, we can extract precise information from a document.

## 0. Setup Ollama

I used [Ollama](https://ollama.com) because it's the easiest way to get up and running with large language models, locally on my computer.

In this case, I used [TinyLlama](https://arxiv.org/pdf/2401.02385.pdf) model by StatNLP Research Group and 
Singapore University of Technology and Design.

On your terminal, run:

```bash
ollama run tinyllama
```

## 1. Loading Environment Variables and Setting Up the Model

In [2]:
import os
from dotenv import load_dotenv

# If you want to use the OpenAI API, you need to set the OPENAI_API_KEY environment variable
load_dotenv()
OPENAI_API_KEY = os.getenv("OPENAI_API_KEY")
#MODEL = "gpt-3.5-turbo"

MODEL = "tinyllama"

## 2. Prepare Embeddings and Test the Model

In [3]:
from langchain_community.llms import Ollama
from langchain_openai.chat_models import ChatOpenAI
from langchain_community.embeddings import OllamaEmbeddings
from langchain_openai.embeddings import OpenAIEmbeddings

if MODEL.startswith("gpt"):
    model = ChatOpenAI(openai_api_key=OPENAI_API_KEY, model=MODEL)
    embeddings = OpenAIEmbeddings()
else:
    model = Ollama(model=MODEL)
    embeddings = OllamaEmbeddings(model=MODEL)

model.invoke("what is machine learning in a few words?")

'Machine learning (ML) is a field that enables computers to learn from data without being explicitly programmed. It helps computers to perform tasks by identifying patterns and making decisions based on the data fed into them. In essence, ML uses algorithms that can learn and improve on their own based on new data points or experiences.'

In [8]:
from langchain_core.output_parsers import StrOutputParser

parser = StrOutputParser()

chain = model | parser 
chain.invoke("what is machine learning in a few words?")

'Machine Learning (ML) is a field of Artificial Intelligence that enables computers to learn from data without being explicitly programmed. ML uses algorithms and statistical models to identify patterns and make predictions based on large amounts of historical data or real-world observations. It has numerous applications in fields such as healthcare, finance, logistics, and e-commerce, where it can improve accuracy, efficiency, and profitability. Machine learning is a versatile technique that can be applied in various fields, including computer vision, natural language processing (NLP), data mining, and robotics.'

## 3. Load the PDF Document

In [4]:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader("./../../../reports-pdf/058-CI-00233.pdf")
pages = loader.load_and_split()
pages

Ignoring wrong pointing object 9 0 (offset 0)
Ignoring wrong pointing object 19 0 (offset 0)
Ignoring wrong pointing object 21 0 (offset 0)
Ignoring wrong pointing object 23 0 (offset 0)
Ignoring wrong pointing object 41 0 (offset 0)
Ignoring wrong pointing object 49 0 (offset 0)
Ignoring wrong pointing object 137 0 (offset 0)


[Document(metadata={'source': './../../../reports-pdf/058-CI-00233.pdf', 'page': 0}, page_content='1'),
 Document(metadata={'source': './../../../reports-pdf/058-CI-00233.pdf', 'page': 1}, page_content='2 \nSeñores grupo de gestión documental  Comisión de la Verdad Asunto: Entrega de Informe a la Comisión de la Verdad Remitente: Agroarte Colombia Firma:    Luis Fernando Álvarez Ramírez “AKA” Representante Legal – Director Ejecutivo Agroarte Colombia Contacto: (+57) 3147123031 www.agroartecolombia.co agroarte@agroartecolombia.co aka@agroartecolombia.co  \n Katerin Delgado Franco presidenta-Directora General Agroarte Colombia Contacto: (+57) 3233885964 www.agroartecolombia.co agroarte@agroartecolombia.co katerindfranco@agroartecolombia.co   RESUMEN  Argumentación en favor del reconocimiento y dotación de protección jurídica a los espacios físicos de memoria generados por la interacción de las victimas y la sociedad civil en los territorios.  Objetivo de reivindicar y contribuir a evitar 

In [6]:
from langchain.prompts import PromptTemplate

template = """
Answer the question based on the context below. If you can't 
answer the question, respond with "I don't know".

Context: {context}

Question: {question}
"""

input_context = input("Enter the context: ")
input_question = input("Enter the question: ")
prompt = PromptTemplate.from_template(template)
print(prompt.format(context=input_context, question=input_question))


Answer the question based on the context below. If you can't 
answer the question, respond with "I don't know".

Context: Agroarte colombia

Question: qué es agroarte?



## 4. Chain the Prompt, Model, and Parser

In [9]:
chain = prompt | model | parser

In [12]:
chain.input_schema.schema()

{'title': 'PromptInput',
 'type': 'object',
 'properties': {'context': {'title': 'Context', 'type': 'string'},
  'question': {'title': 'Question', 'type': 'string'}},
 'required': ['context', 'question']}

In [13]:
chain.invoke(
    {
        "context":input_context, 
        "question": input_question
    }
)

'Respuesta: agrOARTE (Acronísmo de Arturo Ortega) es una empresa española especializada en la ciencia de la computación, fundada por el académico y científico de la ciencia de la computación estadounidense Acadèmic de la Academia Estadounidense del Cercado. La empresa opera a través de su subsidiaria agrOARTE Ciencia & Computación S.A.\n\nSus productos incluyen sistemas de big data, software libre e inteligencia artificial. AgrOARTE se encuentra presente en más de 30 países y sus clientes incluyen grandes corporaciones, instituciones gubernamentales y asociaciones científicas.'

## 5.0 Use a Vector Database to Store and Retrieve the Results

In [10]:
from langchain_community.vectorstores import DocArrayInMemorySearch

vectorstore = DocArrayInMemorySearch.from_documents(pages, embedding=embeddings)



In [11]:
retriever = vectorstore.as_retriever()

In [5]:
from operator import itemgetter

chain = (
    {
        "context": itemgetter("question") | retriever,
        "question": itemgetter("question"),
    }
    | prompt
    | model
    | parser
)

NameError: name 'retriever' is not defined

In [19]:
questions = [
    "What makes the Gemma model special?",
    "Why is Gemma model a new state-of-the-art?",
]

for question in questions:
    print(f"Question: {question}")
    print(f"Answer: {chain.invoke({'question': question})}")
    print()

Question: What makes the Gemma model special?
Answer: The Gemma model is unique because it offers new state-of-the-art open models that perform significantly better than larger models on key benchmarks while adhering to rigorous standards for safe and responsible outputs. This is a result of its design with AI Principles at the forefront, using automatable techniques to filter out sensitive data and incorporating extensive fine-tuining and RLHF from human feedback for aligned instructional behavior. Gemma also conducts robust evaluation with both manual red-teaming and adversarial testing to mitigate potential hazards. 

Additionally, the Model Card reveals that Gemma has undergone a new Responsible Generative AI Toolkit along with Gemma. This sets Gemma apart from other open models in terms of prioritizing safe and responsible AI applications.

Question: Why is Gemma model a new state-of-the-art?
Answer: The question seems to be asking why Gemma model, also known as Google's "Gemma op

## 5.1 Streaming Questions to the Language Model
Basically, what stream does is make the response appear like the style of a chatbot because of a typewriter effect.

In [21]:
for s in chain.stream({"question": "¿puedo hablar con Gemma en español?"}):
    print(s, end="", flush=True)

Sí, podrías hablar con Gemma en español. El programa de desarrollo open source Gemma fue desarrollado por Google y publicado bajo una licencia abierta. Se trata de una plataforma que ofrece modelos de AI de alto rendimiento para la construcción de soluciones al ahorrar y mejorar los servicios de búsqueda y descarga de datos en el área informática y la biomédica, incluyendo el desarrollo de modelos de AI. 

El programa ofrece modelos de AI de alto rendimiento que se pueden usar para predecir las probabilidades de que un dato sea correcto en determinada situación, así como para resolver problemas de supervivencia y seguridad en los medios digitales. Además, Gemma también ofrece modelos de AI para la construcción de soluciones al ahorrar y mejorar el servicio de búsqueda y descarga de datos. 

Gemma ofrece un modelo de aprendizaje continuo basado en redes neuronales con ancho de banda amplio, lo que permite crear modelos altamente precisos para predecir la probabilidad de que un dato sea 

## 5.2 Batching Questions to the Language Model

Basically, what batch does is that it allows you to send a batch of questions to the model. This is useful when you have a lot of questions to ask and you don't want to wait for the model to process each question one by one. This is done in parallel.

In [22]:
questions = [
    "Can I use TensorFlow and Keras with Gemma?",
    "Is there debugging support?",
]

In [23]:

chain.batch([{"question": q} for q in questions])

['Yes, TensoFlow and Keras are both fully compatible with Gemma, as mentioned in the document. These are open-source frameworks that can be used for training machine learning models on top of Gemma models. As a result, it is possible to use TensoFlow and Keras with Gemma for efficient and effective machine learning workflows.',
 'No, the context provided in the given document (based on the text material above) does not mention any debugging support for Gemma.']