### Retrieval-Augmented Generation (RAG)
Retrieval-Augmented Generation combines the generative power of LLMs with the precision of information retrieval systems. This method is ideal for applications where up-to-date accuracy is crucial, such as dynamic content creation or complex decision support systems.

### Step 1: Create a Directory and Add PDFs
Create a directory named pdfs in your project folder and place some PDF files in it.

### Step 2: Install Necessary Libraries
You might need to install libraries for handling PDFs and custom libraries if needed. For example:

In [24]:
# ! pip install llama-index
# ! pip install PyPDF2
# ! pip install llama-index-embeddings-huggingface
# ! pip install llama-index-llms-ollama
# ! pip install langchain


### Download sample data

In [10]:
import os
import requests

# Create the directory if it doesn't exist
os.makedirs("pdfs", exist_ok=True)

# List of URLs to download PDFs from
pdf_urls = [
    "https://files.eric.ed.gov/fulltext/EJ1245288.pdf",
    "https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf"
]

# Download each PDF
for url in pdf_urls:
    response = requests.get(url)
    file_name = os.path.join("pdfs", url.split("/")[-1])
    with open(file_name, "wb") as f:
        f.write(response.content)

print("PDFs downloaded successfully.")


PDFs downloaded successfully.


### Custom Knowledge Base
Custom Knowledge Base: A collection of relevant and up-to-date information that serves as a foundation for RAG. It can be a database, a set of documents, or a combination of both. In this case it's a PDF provided by you that will be used as a source of truth to provide answers to user queries.

Following code will load pdf documents from a directory specified by the user using LlamaIndex's SimpleDirectoryReader:

In [11]:
from llama_index.core import SimpleDirectoryReader  # Replace with the actual import path if different

# Define the path to the directory containing PDFs
input_dir_path = "./pdfs"
data_file = ['./pdfs/dummy.pdf', './pdfs/EJ1245288.pdf' ]
# Initialize the SimpleDirectoryReader
# Load the data from the directory

documents = SimpleDirectoryReader(input_files=data_file).load_data()
# Print the loaded documents to verify
if documents:
    print("Loaded the following documents:")
    for doc in documents:
        print(doc)
else:
    print("No documents loaded.")

Loaded the following documents:
Doc ID: b54475d0-9197-4935-b659-b872972f6fa9
Text: Dumm y PDF file
Doc ID: a60452a2-f7d7-4970-8899-1994b3e846e3
Text: http://www.shanlaxjournals.in 5 Shanlax International Journal of
Educationshanlax #SINCE1990Concept of Teaching  Isola Rajagopalan
Principal (Retired)  District Institute of Educational Training
(DIET), Tirumagalam, TamilNadu, India Abstract Edmund Amidon defined
teaching as “an interactive process, primarily involving classroom
talk which take...
Doc ID: 52461927-cafd-4e74-abb1-ddf30f80e1e9
Text: http://www.shanlaxjournals.in 6 Shanlax International Journal of
Education  Edmund Amidon (1967) defined teaching as “an  interactive
process, primarily involving classroom  talk which takes place between
teacher and pupil and  occurs during certain definable
activities”.Davis et  al. (1962), Gagne et al. (1974) and Gage (1978)
have  contributed...
Doc ID: baac538a-5e0d-410f-b64e-b31ebd768793
Text: http://www.shanlaxjournals.in 7 Shanlax Interna

### Embeddings model
A technique for representing text data as numerical vectors, which can be input into machine learning models. The embedding model is responsible for converting text into these vectors.


In [15]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

embed_model = HuggingFaceEmbedding( model_name="BAAI/bge-large-en-v1.5", trust_remote_code=True)



The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.6k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/779 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/1.34G [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/191 [00:00<?, ?B/s]

### Vector databases

A collection of pre-computed vector representations of text data for fast retrieval and similarity search, with capabilities like CRUD operations, metadata filtering, and horizontal scaling. By default, LlamaIndex uses a simple in-memory vector store that’s great for quick experimentation.


In [17]:
from llama_index.core import Settings
from llama_index.core import VectorStoreIndex

# ====== Create vector store and upload indexed data ======
Settings.embed_model = embed_model # we specify the embedding model to be used
index = VectorStoreIndex.from_documents(documents)

### Retriever module

The retriever takes a query string to use it to fetch relevant context and then sends them both as a prompt to the LLM to generate a final natural language response. The LLM used here is Llama3 which is served locally, thanks to Ollama The final response is displayed in the user interface.


In [20]:
from llama_index.llms.ollama import Ollama
from llama_index.core import Settings

# setting up the llm
llm = Ollama(model="llama3", request_timeout=120.0)

# ====== Setup a query engine on the index previously created ======
Settings.llm = llm # specifying the llm to be used
query_engine = index.as_query_engine(streaming=True, similarity_top_k=4)

### Prompt template
A custom prompt template is use to refine the response from LLM & include the context as well:


In [27]:
from langchain_core.prompts.prompt import PromptTemplate


In [32]:
examples = [
    {
        "question": "Who lived longer, Muhammad Ali or Alan Turing?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali
""",
    },
    {
        "question": "When was the founder of craigslist born?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the founder of craigslist?
Intermediate answer: Craigslist was founded by Craig Newmark.
Follow up: When was Craig Newmark born?
Intermediate answer: Craig Newmark was born on December 6, 1952.
So the final answer is: December 6, 1952
""",
    },
    {
        "question": "Who was the maternal grandfather of George Washington?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who was the mother of George Washington?
Intermediate answer: The mother of George Washington was Mary Ball Washington.
Follow up: Who was the father of Mary Ball Washington?
Intermediate answer: The father of Mary Ball Washington was Joseph Ball.
So the final answer is: Joseph Ball
""",
    },
    {
        "question": "Are both the directors of Jaws and Casino Royale from the same country?",
        "answer": """
Are follow up questions needed here: Yes.
Follow up: Who is the director of Jaws?
Intermediate Answer: The director of Jaws is Steven Spielberg.
Follow up: Where is Steven Spielberg from?
Intermediate Answer: The United States.
Follow up: Who is the director of Casino Royale?
Intermediate Answer: The director of Casino Royale is Martin Campbell.
Follow up: Where is Martin Campbell from?
Intermediate Answer: New Zealand.
So the final answer is: No
""",
    },
]

In [33]:
example_prompt = PromptTemplate(
    input_variables=["question", "answer"], template="Question: {question}\n{answer}"
)

print(example_prompt.format(**examples[0]))

Question: Who lived longer, Muhammad Ali or Alan Turing?

Are follow up questions needed here: Yes.
Follow up: How old was Muhammad Ali when he died?
Intermediate answer: Muhammad Ali was 74 years old when he died.
Follow up: How old was Alan Turing when he died?
Intermediate answer: Alan Turing was 41 years old when he died.
So the final answer is: Muhammad Ali

