# Retrieval-Augmented Generation (RAG)




### Introduction

In the world of natural language processing (NLP), models like ChatGPT have become household names. These models are pre-trained on vast amounts of text data up to a certain point in time, known as their "knowledge cutoff." While incredibly versatile, their static nature means they can't incorporate information or events that occur after this cutoff. This is where Retrieval-Augmented Generation (RAG) comes in, blending the generative capabilities of models like ChatGPT with the dynamic, up-to-date knowledge from external sources.


### How RAG Works




![RAG](https://taesiri.xyz/data/rag2.png)



RAG enhances traditional language models through a two-stage process:

1. **Retrieval Stage**: The system queries a continuously updated database or knowledge base to find information relevant to the input query. This allows the model to access the most current data, even if it's beyond its original training cutoff.

2. **Generation Stage**: Leveraging a generative model (e.g., GPT), RAG integrates the context from the retrieved documents to produce informed and relevant text. This step ensures that the generation is not only based on the model's pre-trained knowledge but is also augmented with the latest information.

### Key Applications

- **Question Answering**: RAG systems can answer questions with the most current information, overcoming the knowledge cutoff limitation of standalone generative models.
- **Conversational Agents**: Chatbots powered by RAG can provide users with up-to-date answers, making them more useful for current events and news-related queries.

### Advantages

- **Current Information**: RAG allows language models to break free from their knowledge cutoff, making them more relevant for today's rapidly changing world.
- **Depth and Accuracy**: The retrieval component ensures that the generated content is not only contextually relevant but also deeply informative and factually accurate.
- **Adaptability**: By changing the external data sources, RAG can be tailored to different domains and information needs.

### [Llamaindex](https://www.llamaindex.ai/)

LlamaIndex is a versatile data framework designed to connect custom data sources with large language models (LLMs) like GPT-4. It serves as a bridge between enterprise data and LLM applications, enabling the ingestion, structuring, retrieval, and integration of data for various applications. LlamaIndex allows for the loading of data from over 160 sources in different formats, indexing this data for diverse use cases, and orchestrating LLM workflows efficiently. It offers a comprehensive suite of modules to evaluate LLM application performance and seamlessly integrates with observability partners. Additionally, LlamaIndex boasts a thriving developer network, community contributions, and integration options with various services.


### Project

In this notebook, we'll guide you through the process of leveraging LlamaIndex to enhance information retrieval and text generation. Specifically, we will demonstrate how to use LlamaIndex to upload a PDF document and dissect it into manageable segments. These segments will then be systematically stored in a vector database, designed for efficient querying. When a query is submitted, our system will search this database to find the most relevant document segments. The most pertinent segment – or `chunk` – will be retrieved as the context to address the query. Subsequently, this context will be provided to a compact language model, in this case, [Microsoft Phi-2](https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/). We'll then instruct Phi-2 to craft a response, drawing upon both the specific question posed and the context supplied by the selected document chunk. This method showcases the synergy between advanced retrieval techniques and modern language models to generate informed, contextually relevant responses.










## Installing dependencies

(This step might take longer than 10 minutes)

In [None]:
!pip install -q pypdf
!pip install -q python-dotenv
!pip install -q llama-index
!pip install -q gradio
!pip install -q einops
!pip install -q accelerate
!pip install -q llama-index-embeddings-huggingface
!pip install -q llama-index-embeddings-instructor
!pip install -q llama-index-llms-huggingface
!pip install -q llama-index-llms-openai

[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m290.4/290.4 kB[0m [31m2.2 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m15.4/15.4 MB[0m [31m53.0 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.0/2.0 MB[0m [31m81.5 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m311.2/311.2 kB[0m [31m14.7 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m75.6/75.6 kB[0m [31m10.9 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m136.1/136.1 kB[0m [31m15.8 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m1.8/1.8 MB[0m [31m69.4 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m77.9/77.9 kB[0m [31m9.6 MB/s[0m eta [36m0:00:00[0m
[2K     [90m━━━━━━━━━━━━━━━━━━

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Mounted at /content/drive


In [None]:
from llama_index.core import VectorStoreIndex,SimpleDirectoryReader,ServiceContext,PromptTemplate
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core.prompts import PromptTemplate
import torch

In [None]:
!mkdir Data

In [None]:
# TODO - Upload a single PDF document into Data folder

# we will use SimpleDirectoryReader to load all the documents in a folder
documents = SimpleDirectoryReader("./Data").load_data()

In [None]:
len(documents)

2

In [None]:
system_prompt = "You are a Q&A assistant. Your goal is to answer questions as accurately as possible based on the instructions and context provided."

# This will wrap the default prompts that are internal to llama-index
query_wrapper_prompt = PromptTemplate("<|USER|>{query_str}<|ASSISTANT|>")
query_wrapper_prompt

PromptTemplate(metadata={'prompt_type': <PromptType.CUSTOM: 'custom'>}, template_vars=['query_str'], kwargs={}, output_parser=None, template_var_mappings=None, function_mappings=None, template='<|USER|>{query_str}<|ASSISTANT|>')

In [None]:
llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=256,
    generate_kwargs={"temperature": 0.0, "do_sample": False},
    system_prompt=system_prompt,
    query_wrapper_prompt=query_wrapper_prompt,
    tokenizer_name="microsoft/phi-2",
    model_name="microsoft/phi-2",
    device_map="cuda",
    # uncomment this if using CUDA to reduce memory usage
    model_kwargs={"torch_dtype": torch.bfloat16}
)


The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


config.json:   0%|          | 0.00/863 [00:00<?, ?B/s]

model.safetensors.index.json:   0%|          | 0.00/35.7k [00:00<?, ?B/s]

Downloading shards:   0%|          | 0/2 [00:00<?, ?it/s]

model-00001-of-00002.safetensors:   0%|          | 0.00/5.00G [00:00<?, ?B/s]

model-00002-of-00002.safetensors:   0%|          | 0.00/564M [00:00<?, ?B/s]

Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

generation_config.json:   0%|          | 0.00/124 [00:00<?, ?B/s]



tokenizer_config.json:   0%|          | 0.00/7.34k [00:00<?, ?B/s]

vocab.json:   0%|          | 0.00/798k [00:00<?, ?B/s]

merges.txt:   0%|          | 0.00/456k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.11M [00:00<?, ?B/s]

added_tokens.json:   0%|          | 0.00/1.08k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/99.0 [00:00<?, ?B/s]

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.


In [None]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# loads BAAI/bge-small-en-v1.5 from huggingface for embedding - https://huggingface.co/BAAI/bge-small-en-v1.5

embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

service_context = ServiceContext.from_defaults(
    chunk_size=1024,
    llm=llm,
    embed_model=embed_model
)

modules.json:   0%|          | 0.00/349 [00:00<?, ?B/s]

config_sentence_transformers.json:   0%|          | 0.00/124 [00:00<?, ?B/s]

README.md:   0%|          | 0.00/94.8k [00:00<?, ?B/s]

sentence_bert_config.json:   0%|          | 0.00/52.0 [00:00<?, ?B/s]

config.json:   0%|          | 0.00/743 [00:00<?, ?B/s]

model.safetensors:   0%|          | 0.00/133M [00:00<?, ?B/s]

tokenizer_config.json:   0%|          | 0.00/366 [00:00<?, ?B/s]

vocab.txt:   0%|          | 0.00/232k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/711k [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/125 [00:00<?, ?B/s]

1_Pooling/config.json:   0%|          | 0.00/190 [00:00<?, ?B/s]

  service_context = ServiceContext.from_defaults(


In [None]:
# Create an vector database from document chunks

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

query_engine = index.as_query_engine()

# Query the database and return the most relevant conent to the query
def predict(input, history):
  response = query_engine.query(input)
  return str(response)


In [None]:
# TODO: Try querying the engine with multiple question and examine the response and source_nodes
r = query_engine.query("What is impact of Computers on Education?")

Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.


In [None]:
# examine the response
r

Response(response='Computers have broadened access to knowledge and learning resources. Online \ncourses, educational software, and access to vast databases have facilitated a more interactive \nand flexible learning environment. Students can now learn at their ow n pace and explore subjects \nbeyond the confines of traditional classrooms. Moreover, computers have become essential in \nresearch, where they assist in simulations, calculations, and data management, accelerating \nscientific discoveries and innovation.\n', source_nodes=[NodeWithScore(node=TextNode(id_='c4848119-2c98-49a9-8499-56b54615875b', embedding=None, metadata={'page_label': '1', 'file_name': 'The Impact and Evolution of Computers.pdf', 'file_path': '/content/Data/The Impact and Evolution of Computers.pdf', 'file_type': 'application/pdf', 'file_size': 28095, 'creation_date': '2024-04-22', 'last_modified_date': '2024-04-22'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_m

In [None]:
# examine the source nodes used for the answer
r.source_nodes[0]

NodeWithScore(node=TextNode(id_='c4848119-2c98-49a9-8499-56b54615875b', embedding=None, metadata={'page_label': '1', 'file_name': 'The Impact and Evolution of Computers.pdf', 'file_path': '/content/Data/The Impact and Evolution of Computers.pdf', 'file_type': 'application/pdf', 'file_size': 28095, 'creation_date': '2024-04-22', 'last_modified_date': '2024-04-22'}, excluded_embed_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], excluded_llm_metadata_keys=['file_name', 'file_type', 'file_size', 'creation_date', 'last_modified_date', 'last_accessed_date'], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='d0724d36-01e0-437c-b4e5-cbbb5dbb1993', node_type=<ObjectType.DOCUMENT: '4'>, metadata={'page_label': '1', 'file_name': 'The Impact and Evolution of Computers.pdf', 'file_path': '/content/Data/The Impact and Evolution of Computers.pdf', 'file_type': 'application/pdf', 'file_size': 28095, 'creation_dat

### Using a Chat Interface

Below, we have created a chat interface that allows you to ask various questions based on the document stored in it. Please use this chat application to ask 10 different questions and then report your understanding in the cell below.

In [None]:
import gradio as gr

gr.ChatInterface(predict).launch(share=True)

Colab notebook detected. To show errors in colab notebook, set debug=True in launch()
Running on public URL: https://83dc5670a67fd36907.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)




In [None]:
# Write your answer and analysis here.

**Here is my chat where I asked exactly 10 questions:**

**Question 1**

    What is impact of Computers on Education?

    Computers have broadened access to knowledge and learning resources. Online
    courses, educational software, and access to vast databases have facilitated a more interactive
    and flexible learning environment. Students can now learn at their ow n pace and explore subjects
    beyond the confines of traditional classrooms. Moreover, computers have become essential in
    research, where they assist in simulations, calculations, and data management, accelerating
    scientific discoveries and innovation.


**Question 2**

    what is the impact of computers on economy?

    Computers have transformed industries by streamlining operations, enhancing
    productivity, and creating a global marketplace. They are integral tools in business, finance, and
    manufacturing, where they perform tasks ranging from data analysis to automation and control of complex
    systems.


**Question 3**

    what is the role of AI in computers?

    AI promises to enhance decision-making and automation.

**Question 4**

    Why do we need fast computers?

    We need fast computers because they can perform tasks more quickly and efficiently, allowing us to accomplish more in less time.
    This is particularly important in fields such as data analysis, scientific research, and high-performance computing, where speed is crucial for processing large amounts of data or running complex simulations. Additionally, fast computers enable us to access and utilize the vast amount of information available on the internet, enhancing our ability to learn, communicate, and stay connected.

**Question 5**

    What are the 3 laws of motion in physics?

    The three laws of motion in physics are:
    1. An object at rest will remain at rest, and an object in motion will continue in motion with the same speed and in the same direction unless acted upon by an unbalanced force.
    2. The rate of change of momentum of an object is directly proportional to the force applied to it and occurs in the direction of the force.
    3. For every action, there is an equal and opposite reaction.

**Question 6**

    How have computers evolved?

    Computers have evolved from large, room-sized machines to sleek, portable devices. They have become more affordable and accessible to the general public.
    The introduction of personal computers (PCs) in the late 20th century was a pivotal moment, democratizing access to computing power and information.


**Question 7**

    how can we use computers in science?

    Computers are essential tools in science, aiding in simulations, calculations,
    and data management. They assist in scientific discoveries and innovation, accelerating the pace
    of research. Computers are used in various scientific fields, including physics, chemistry, biology,
    and astronomy, to model complex systems, analyze large datasets, and simulate experiments. They
    play a crucial role in advancing our understanding of the natural world and developing new technologies.


**Question 8**

    what is the impact of computers on media?

    Computers have revolutionized the media industry, enabling digital filmmaking,
    video games, music production, and other forms of creative expression. They have become an
    indispensable part of daily life, managing personal information through devices like smartphones
    and smartwatches.

**Question 9**

    how can I use computer to make money?

    There are several ways to use computers to make money. One common method is to
    start an online business, such as an e-commerce store or a freelance writing or graphic design
    business. Another option is to invest in the stock market or cryptocurrency, using computer
    software to analyze market trends and make informed investment decisions. Additionally, you can
    earn money by creating and selling digital products, such as e-books, online courses, or digital
    content. The possibilities are vast, and it's important to explore different avenues and find the
    opportunities that align with your skills and interests.


**Question 10**

    What are four use cases of Computer?

    The four use cases of computers are:
    1. Computers are used in business, finance, and manufacturing to streamline operations, enhance productivity, and automate complex systems.
    2. Computers have revolutionized communication, enabling instant connectivity across the globe through the internet and social media platforms.
    3. Computers have broadened access to knowledge and learning resources, facilitating online courses, educational software, and access to vast databases.
    4. Computers have become an indispensable part of daily life, managing personal information through devices like smartphones and smartwatches.

Analyzing the responses provided by the RAG system based on your interactions and the contents of the PDF "The Impact and Evolution of Computers," let's categorize the analysis based on the three scenarios you described:

### 1. Information Directly Present in the PDF
- **Question on Education and Computers:**
  - **Query:** "What is the impact of Computers on Education?"
  - **Response:** The answer closely aligns with the content of the PDF, discussing the broadening access to knowledge, the role of computers in educational environments, and their essential role in research. This indicates effective retrieval and generation, accurately leveraging information directly from the document.
- **Question on Computers in the Economy:**
  - **Query:** "What is the impact of computers on economy?"
  - **Response:** The response perfectly mirrors the PDF's discussion about how computers have transformed industries, enhanced productivity, and created a global marketplace. This is an ideal outcome for RAG systems when the query matches the document content.

- **Question on four use cases of Computer?**
  - **Query:** "What are four use cases of Computer?"
  - **Response:** when I asked the a question where information regarding the use cases of computer was dispered in whole pdf and not concentrated at a particular paragraph, it was able to obtain the information from those places in pdf without looking outside the given infromation/context.

### 2. Information Partly Present in the PDF
- **Question on the use of computers in science:**
  - **Query:** "how can we use computers in science?"
  - **Response:** The system provided a very detailed answer covering the information that was covered in the pdf as well the information that was not provided. Here, the system seems to have used general knowledge, to provide information exploits RAG features.

### 3. Information Not in the PDF
- **Question on Physics (Three Laws of Motion):**
  - **Query:** "What are the 3 laws of motion in physics?"
  - **Response:** This question is unrelated to the document's content. The system correctly provided the three laws of motion using its pre-trained knowledge, demonstrating its generative capabilities when no relevant document content is available.

### Additional Insights from Other Responses
- **Computers in Science, Media, and Methods to Make Money Using Computers:**
  - These responses show the system's ability to provide detailed answers by possibly blending retrieved content and pre-trained knowledge. The answer on using computers in science and the impact on media reflect content that aligns well with the document’s themes. The response on making money with computers, while not directly covered in the PDF, showcases the system's ability to generate useful and contextually appropriate advice based on broader knowledge.

### Conclusion
The RAG system demonstrates strong performance in scenarios where information is directly present in the input document, leveraging this data effectively to provide accurate and detailed answers. When the information is only partially available or completely absent, the system still manages to provide reasonable responses, albeit sometimes generic, by relying on its underlying generative model trained on a wider corpus. This indicates a robust integration between the retrieval and generation stages, capable of handling a variety of information needs and query complexities.

The performance suggests that while the system excels in directly relevant contexts, it might benefit from enhancements in handling partially relevant information, perhaps by integrating more nuanced data or refining the generative component to make fuller use of the retrieved snippets. In cases where the information is entirely absent from the provided documents, the system's reliance on its base model's knowledge ensures that it remains useful and informative, showcasing the flexibility and comprehensive capabilities of RAG systems.