# PDF/Document Questions Answering tool using RAG Architecture and LlamaIndex

# **Step 1: INSTALLATION OF LIBRARIES**  

In [None]:
!pip install -q  transformers==4.39 einops accelerate bitsandbytes
!pip install -q llama-index
!pip install -q llama-index-llms-huggingface
!pip install -q  sentence_transformers
!pip install -q llama-index-embeddings-huggingface
!pip install llmsherpa -q
!pip install -q llama-index-readers-smart-pdf-loader

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.8/8.8 MB[0m [31m48.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m44.6/44.6 kB[0m [31m5.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m297.3/297.3 kB[0m [31m29.1 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m102.2/102.2 MB[0m [31m8.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m23.7/23.7 MB[0m [31m40.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m823.6/823.6 kB[0m [31m53.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m14.1/14.1 MB[0m [31m58.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m731.7/731.7 MB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━

In [None]:
# For document loading and vectorization
from llama_index.core import VectorStoreIndex,SimpleDirectoryReader

# For importing models from huggingface
from llama_index.llms.huggingface import HuggingFaceLLM

# For System and query prompt
from llama_index.core.prompts.prompts import SimpleInputPrompt

# For quantization of the model
from transformers import BitsAndBytesConfig

#For embedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
from llama_index.core.node_parser import SentenceSplitter

# For quantization of the models into 4 bits
from transformers import BitsAndBytesConfig
import torch

# For Response times
import time

# **STEP 2: LOADING THE LLM AND EMBEDDING MODELS FROM HUGGING FACE** #

system_prompt="""You are a Q&A assistant. Try your best to answer the user's questions based on the information in the documents."""

query_wrapper_prompt = SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>")

In [None]:
system_prompt="""You are a Q&A assistant designed to answer user questions in a concise and informative way. Use the information from the documents and your knowledge to provide short answers that directly address the user's intent. Avoid generating irrelevant text before the answer.

  Make sure to answer it correctly and precisely and do not generate any incomplete sentences
  Don't add more queries to the existing responses, just answer the prompt and terminate the generation of response and the response should be only relevant to the context extracted from the documents and also the contents of the document feeded into the model
  GIVE SHORT ANSWERS FOR ALL QUESTIONS MINIMUM 2 SENTENCES MAXIMUM 5 OR 6 SENTENCES ONLY
 **Generate only from the given context and not from your knowledge baase, the prompts are always about the documents feeded to you and the responses are needed relevant to the context from the documents**

 "Explain" and other open ended questions (e.g., explain the campaign modules):**
  Provide a clear and concise explanation that addresses the user's intent. The responses must be within 5-6 lines only, no larger explanations needed , only needed when explaining a workflow or providing instructions to do a task
  Use the information from the documents to support your explanation
  but avoid overly verbose or repetitive language. Summarize key points
  and avoid going into unnecessary detail. Must reply in clear, completed sentences and try using precise language.

  For example,

  Prompt: How can I host Twixor EnCaps back end?
  Answer: Based on the information retrieved from the document, Twixor EnCaps back end can be hosted either on-premise or on the Cloud.

  **If the answer cannot be found in the knowledge base, inform the user.**
"""

query_wrapper_prompt = SimpleInputPrompt("<|USER|>{query_str}<|ASSISTANT|>")

In [None]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    To login, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Token: 
Add token as git credential? (Y/n) y
Token is valid (permission: write).
[1m[31mCannot authenticate through git-credential as no helper is defined on your machine.
You might have to re-authenticate when pushing to the Hugging Face Hub.
Run the following command in your terminal in case you want to set the 'stor

In [None]:
quantization_config = BitsAndBytesConfig(
  load_in_4bit=True,
  bnb_4bit_compute_dtype=torch.float16,
  bnb_4bit_quant_type="nf4",
  bnb_4bit_use_double_quant=True,
)


In [None]:
llm = HuggingFaceLLM(
context_window=2000,
max_new_tokens=200,
generate_kwargs={"temperature": 0.50, "do_sample": True},
system_prompt=system_prompt,
query_wrapper_prompt=query_wrapper_prompt,
tokenizer_name="mistralai/Mistral-7B-Instruct-v0.2",
model_name="mistralai/Mistral-7B-Instruct-v0.2",
device_map="auto",
model_kwargs={"torch_dtype": torch.float16 , "quantization_config": quantization_config }
)

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/596 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/25.1k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/3 [00:00<?, ?it/s]

model-00001-of-00003.safetensors:   0%|          | 0.00/4.94G [00:00<?, ?B/s]

model-00002-of-00003.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00003-of-00003.safetensors:   0%|          | 0.00/4.54G [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/3 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/111 [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/1.46k [00:00<?, ?B/s]

tokenizer.model:   0%|          | 0.00/493k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/1.80M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/72.0 [00:00<?, ?B/s]

In [None]:
from llama_index.core.node_parser import SentenceSplitter
from llama_index.core import Settings

embed_model=HuggingFaceEmbedding(model_name="sentence-transformers/all-MiniLM-L12-v2")
Settings.transformations = [SentenceSplitter(chunk_size=1024, chunk_overlap=20)]

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/116 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/10.7k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/53.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/615 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/352 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/466k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/112 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

# **STEP 3: FEEDING DATA AND VECTORIZATION**

In [None]:
# prompt: unzip data.zip file
!unzip data.zip

Archive:  data.zip
  inflating: data/Prompts.txt        
  inflating: data/Twixor.pdf         


In [None]:
from llama_index.readers.smart_pdf_loader import SmartPDFLoader
llmsherpa_api_url = "https://readers.llmsherpa.com/api/document/developer/parseDocument?renderFormat=all"
pdf_url = "data/Twixor.pdf" # also allowed is a file path e.g. /home/downloads/xyz.pdf
pdf_loader = SmartPDFLoader(llmsherpa_api_url=llmsherpa_api_url)
document = pdf_loader.load_data(pdf_url)

In [None]:
index=VectorStoreIndex.from_documents(document, embed_model = embed_model, transformations = Settings.transformations)
query_engine=index.as_query_engine(llm=llm)

 Reading documents in a Directory


documents=SimpleDirectoryReader("data/").load_data()
//documents
index=VectorStoreIndex.from_documents(document, embed_model = embed_model, transformations = Settings.transformations)
query_engine=index.as_query_engine(llm=llm)

# **STEP 4: PASSING PROMPTS AND THEIR RESPONSES**

In [None]:


def ask(prompt):
  """
  This function asks the query engine a question, measures the response time,
  and handles the response, focusing on extracting only the relevant answer.

  Args:
      prompt (str): The question to ask the query engine.
  """
  start_time = time.time()
  response = query_engine.query(prompt)
  end_time = time.time()
  # Calculate response time
  response_time = end_time - start_time

  # Check if response is a dictionary and extract answer using 'answer' key
  if isinstance(response, dict):
    answer = response.get('answer')
    if answer:
      # Assume answer could be anything and print it directly
      print(f"Answer: {answer}")
      print(f"\n Response Time: {response_time:.4f} seconds")
  else:
    print(response)  # Handle non-dictionary responses (e.g., errors)

    # Additional Option: Print Response Time for non-dictionary responses
    print(f"Response Time: {response_time:.4f} seconds (Non-standard format)")
    # %time


In [None]:


prompt_file = "data/Prompts.txt"  # Replace with your actual filename
start_t = time.time()
#Open the file in read mode
with open(prompt_file, "r") as file:
  #Loop through each line (prompt) in the file
  for line in file:
    # Remove trailing newline character (if any)
    prompt = line.strip()
    # Call the ask function with the current prompt
    print(f"Prompt: {prompt}")
    ask(prompt)
    %time
    print("-" * 100)

end_t = time.time()
response_t = end_t - start_t
print(response_t)

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Prompt: What are the four main subject areas Hawking explores in the book?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


According to the document, Stephen Hawking explores science, courage, disabled people's rights, and opportunities for disabled people in his book.
Response Time: 10.7875 seconds (Non-standard format)
CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.96 µs
----------------------------------------------------------------------------------------------------
Prompt: In Hawking's view, what is the most fundamental question humanity faces?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


According to Hawking, the most fundamental question humanity faces is understanding the quantum gravity laws and the birth of our universe.
Response Time: 2.4720 seconds (Non-standard format)
CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.01 µs
----------------------------------------------------------------------------------------------------
Prompt: Describe Hawking's explanation for the origin of the universe.


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Hawking challenged the idea that the universe began with a singularity, as this concept was not consistent with quantum mechanics. He believed that Einstein's general relativity, a classical theory, broke down near the Big Bang due to its assumption of well-defined particle positions and speeds. Instead, he suggested that there exists a certain level of randomness or uncertainty in nature, as described by the Uncertainty Principle. This principle states that it is impossible to precisely predict both the position and speed of a particle. Hawking's theories, which were grounded in quantum mechanics, have contributed to our ongoing understanding of the universe's origins.
Response Time: 11.3185 seconds (Non-standard format)
CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.25 µs
----------------------------------------------------------------------------------------------------
Prompt: Does Hawking argue for the existence of God? Why or why not?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Hawking did not argue for or against the existence of God in his work. He stated that his work was about finding a rational framework to understand the universe around us, not about proving or disproving the existence of God.
Response Time: 4.2109 seconds (Non-standard format)
CPU times: user 1e+03 ns, sys: 1e+03 ns, total: 2 µs
Wall time: 5.48 µs
----------------------------------------------------------------------------------------------------
Prompt: What are some of the scientific arguments for and against time travel?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


According to the context, there are arguments both for and against time travel based on scientific evidence. The arguments for time travel include the bending of light which shows that space-time is curved, and the Casimir effect which demonstrates that it can be warped in the negative direction. However, there are also arguments against time travel, such as the Chronology Protection Conjecture which suggests that the laws of physics conspire to prevent it on a macroscopic scale due to the potential problems it could cause with determinism and free will. The context also raises the question of why no one has come back from the future to tell us how to do it.
Response Time: 11.2883 seconds (Non-standard format)
CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.48 µs
----------------------------------------------------------------------------------------------------
Prompt: Based on the book's discussion of future threats, what is one of the biggest challenges humanity might face

Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


According to the context, one of the biggest challenges humanity might face is runaway climate change, which could lead to a rise in ocean temperature, melting of ice caps, and the release of large amounts of carbon dioxide, potentially making the climate like that of Venus.
Response Time: 5.5610 seconds (Non-standard format)
CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.48 µs
----------------------------------------------------------------------------------------------------
Prompt: Does the book mention any potential benefits of colonizing space?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Yes, the context mentions several potential benefits of colonizing space. These include the discovery of water and oxygen on Mars, the availability of minerals and metals due to volcanic activity, and the potential for advancing human knowledge and technology. The context also mentions the historical significance of space exploration and the impact it had on humanity.
Response Time: 6.0021 seconds (Non-standard format)
CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.48 µs
----------------------------------------------------------------------------------------------------
Prompt: Does Hawking express optimism or concern about the future of AI?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


According to the context provided, Hawking's contributions have largely taken the form of asking questions rather than providing answers. Therefore, his views on the future of AI are not explicitly stated in the given text.
Response Time: 3.8076 seconds (Non-standard format)
CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.72 µs
----------------------------------------------------------------------------------------------------
Prompt: Based on Hawking's overall arguments, what seems to be his biggest hope for the future of humanity?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Based on Hawking's eulogy and the publisher's note, it appears that Hawking's biggest hope for the future of humanity is the mastery of quantum gravity laws and the full comprehension of the birth of our universe.
Response Time: 4.9348 seconds (Non-standard format)
CPU times: user 3 µs, sys: 1e+03 ns, total: 4 µs
Wall time: 9.78 µs
----------------------------------------------------------------------------------------------------
Prompt: What is the significance of the Black Hole Information Paradox discussed in the book?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


The Black Hole Information Paradox is a significant issue in theoretical physics that questions whether information is lost when an object falls into a black hole or not. Despite decades of research, a definitive answer has yet to be found.
Response Time: 4.1384 seconds (Non-standard format)
CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.48 µs
----------------------------------------------------------------------------------------------------
Prompt: How does the book differentiate between scientific theories and philosophical beliefs?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


According to the context, the book does not explicitly differentiate between scientific theories and philosophical beliefs. However, it suggests that scientific theories are based on observations and equations, while philosophical beliefs are presented in a clear way without equations. The author believes that most people can understand and appreciate the basic ideas of scientific theories if they are presented in a clear way.
Response Time: 6.9439 seconds (Non-standard format)
CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 5.25 µs
----------------------------------------------------------------------------------------------------
Prompt: According to Hawking, how might advancements in physics help us answer some of the big questions about the universe?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


According to Hawking, when we ultimately master the quantum gravity laws, we may be able to comprehend fully the birth of our universe. This could represent a significant breakthrough in answering some of the big questions about the universe. (Refer to the quote "When ultimately we master the quantum gravity laws, and comprehend fully the birth of our universe, it may largely be by standing on the shoulders of Hawking.")
Response Time: 7.1475 seconds (Non-standard format)
CPU times: user 2 µs, sys: 1e+03 ns, total: 3 µs
Wall time: 6.2 µs
----------------------------------------------------------------------------------------------------
Prompt: Imagine you could interview Stephen Hawking about the future of space exploration. What would you ask him?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Given the context, a potential question for Stephen Hawking about the future of space exploration could be: "Professor Hawking, with your profound understanding of the universe and its mysteries, what do you foresee as the most significant achievements in space exploration in the coming decades, and how might they impact our understanding of the cosmos and our place within it?"
Response Time: 6.9001 seconds (Non-standard format)
CPU times: user 5 µs, sys: 0 ns, total: 5 µs
Wall time: 8.11 µs
----------------------------------------------------------------------------------------------------
Prompt: How does Hawking's approach differ from "A Brief History of Time"?


Setting `pad_token_id` to `eos_token_id`:2 for open-end generation.


Hawking's original title for his book, "From the Big Bang to Black Holes: A Short History of Time," was longer than the published title, "A Brief History of Time." The published title was shortened, resulting in a difference in the emphasis and potential scope of the content.
Response Time: 5.4216 seconds (Non-standard format)
CPU times: user 4 µs, sys: 0 ns, total: 4 µs
Wall time: 5.72 µs
----------------------------------------------------------------------------------------------------
Prompt: After reading the book, what is one big question you are now curious to learn more about?
Based on the context, the author expressed his fascination with the big questions about the universe and our understanding of it. No specific question was mentioned in the text that the author is currently curious about learning more about.
Response Time: 4.5111 seconds (Non-standard format)
CPU times: user 3 µs, sys: 0 ns, total: 3 µs
Wall time: 16.9 µs
---------------------------------------------------

##**----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------**

In [None]:
# prompt: create a infinite loop that prints hello every five seconds

import time

while True:
  print('Hello')
  time.sleep(300)


Hello


KeyboardInterrupt: 