<a href="https://colab.research.google.com/github/arav-dhoot/DL-Workshop/blob/main/rag.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## **Rag-Tag Implementation of RAG**

In [1]:
%pip install llama_index
%pip install llama-index-readers-file pymupdf
%pip install llama-index-embeddings-huggingface
%pip install -q transformers einops accelerate langchain bitsandbytes sentence_transformers fitz
%pip install llama-index-llms-huggingface-api
%pip install llama-index-llms-huggingface
%pip install -U langchain-community



In [2]:
!huggingface-cli login


    _|    _|  _|    _|    _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|_|_|_|    _|_|      _|_|_|  _|_|_|_|
    _|    _|  _|    _|  _|        _|          _|    _|_|    _|  _|            _|        _|    _|  _|        _|
    _|_|_|_|  _|    _|  _|  _|_|  _|  _|_|    _|    _|  _|  _|  _|  _|_|      _|_|_|    _|_|_|_|  _|        _|_|_|
    _|    _|  _|    _|  _|    _|  _|    _|    _|    _|    _|_|  _|    _|      _|        _|    _|  _|        _|
    _|    _|    _|_|      _|_|_|    _|_|_|  _|_|_|  _|      _|    _|_|_|      _|        _|    _|    _|_|_|  _|_|_|_|

    A token is already saved on your machine. Run `huggingface-cli whoami` to get more information or `huggingface-cli logout` if you want to log out.
    Setting a new token will erase the existing one.
    To log in, `huggingface_hub` requires a token generated from https://huggingface.co/settings/tokens .
Enter your token (input will not be visible): 
Add token as git credential? (Y/n) n
Token is valid (permission: fineG

In [3]:
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

In [4]:
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en")

The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab (https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.


In [5]:
from pathlib import Path
from llama_index.readers.file import PyMuPDFReader

In [6]:
from llama_index.core import VectorStoreIndex
from llama_index.llms.huggingface import HuggingFaceLLM
from llama_index.core.prompts.prompts import SimpleInputPrompt

In [7]:
prompt = """You are a specialized data extraction assistant designed to retrieve, parse, and summarize information from companies' SEC filings. Your tasks include analyzing structured and unstructured data, identifying relevant sections, and summarizing key details. You must output concise, accurate, and well-structured data based on user queries."""
query_wrapper = SimpleInputPrompt('<|USER|>{query_str}<|ASSISTANT|>')

In [8]:
import torch
from llama_index.llms.huggingface.base import HuggingFaceLLM

llm = HuggingFaceLLM(
    context_window=4096,
    max_new_tokens=256,
    generate_kwargs={'temperature': 0.1},
    system_prompt=prompt,
    query_wrapper_prompt=query_wrapper,
    tokenizer_name='meta-llama/Meta-Llama-3-8B',
    model_name='meta-llama/Meta-Llama-3-8B',
    model_kwargs={'torch_dtype': torch.float16, 'load_in_8bit':True}
)

The `load_in_4bit` and `load_in_8bit` arguments are deprecated and will be removed in the future versions. Please, pass a `BitsAndBytesConfig` object in `quantization_config` argument instead.


Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [9]:
from llama_index.core import Settings

Settings.llm = llm
Settings.embed_model = embed_model
Settings.num_output = 1024

In [10]:
import os
os.path.exists('/content/apple-annual-report.pdf')

False

In [11]:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader


documents = SimpleDirectoryReader('/content/data').load_data()
index = VectorStoreIndex.from_documents(documents)

In [12]:
rag = index.as_query_engine()

In [13]:
response = rag.query("Summarize this document")
print(response.response[:response.response.rfind('.') + 1])

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


 The document is a summary of the Apple Inc. 2022 Form 10-K. It includes information on the Company's financial statements, accounting policies, and revenue recognition. The document also provides details on the Company's debt securities, including the indentures and ofﬁcer's certiﬁcates. The document is a comprehensive and detailed summary of the Company's ﬁnancial statements and accounting policies.


In [14]:
%pip install ragas



## **Evaluation**

My hypothesis is that the RAG model will be ineffective at answering queries that  


*   are outside the scope of the document
*   ask specifics about graphs or tables
*   require mathematical calculations or understanding
*   ask for lists

In [15]:
sample_queries = [
    "What is NVIDIA?",
    "What color is an apple?",
    "What was the value of Apple's stock on 9/25/21?",
    "What was the percentage increase in iPhone sales in 2022 compared to 2021?",
    "How are inventories measured?",
    "What was the European Commission State Aid Decision?",
    "How many RSUs vested on September 28, 2019?",
    "What is the state of Apple's financials in 2025?",
    "List all the directors at Apple.",
    "What were the new products released by Apple this past year."
]

expected_responses = [
    "NVIDIA is an American multinational corporation that designs and sells graphics processing units (GPUs) and other computing hardware and software. NVIDIA is a leader in artificial intelligence (AI) and accelerated computing.",
    "Apples can be many colors, including red, green, yellow, pink, or russetted. The most common color for apples is red.",
    "Apple's stock was valued at approximately $400 per share on 9/25/21.",
    "Apple sold 7% more phones in 2022 compared to 2021.",
    "Inventories are measured using the first-in, first-out method.",
    "On August 30, 2016, the European Commission announced its decision that Ireland granted state aid to the Company by providing tax opinions in 1991 and 2007 concerning the tax allocation of profits of the Irish branches of two subsidiaries of the Company (the \"State Aid Decision\").",
    "157,743 RSUs were vested on September 28, 2019.",
    "The financial status of Apple in 2025 remains unpredictable due to various market dynamics and external uncertainties.",
    "Timothy D. Cook, Lucas Maestri, Chris Kondo, James A. Bell, Al Gore, Alex Gorsky, Andrea Jung, Arthur D. Levinson, Monica Lozano, Ronald D. Sugar, Susan L. Wagner.",
    "Updated MacBook Pro 14” and MacBook Pro 16”, powered by the Apple M1 Pro or M1 Max chip, Third generation of AirPods, Updated iPhone SE with 5G technology, All-new Mac Studio, powered by the Apple M1 Max or M1 Ultra chip, All-new Studio Display, Updated iPad Air with 5G technology, powered by the Apple M1 chip, Updated MacBook Air and MacBook Pro 13”, both powered by the Apple M2 chip, iOS 16, macOS Ventura, iPadOS 16 and watchOS 9, updates to the Company’s operating systems, Apple Pay Later, a buy now, pay later service, iPhone 14, iPhone 14 Plus, iPhone 14 Pro and iPhone 14 Pro Max, Second generation of AirPods Pro, Apple Watch Series 8, updated Apple Watch SE and all-new Apple Watch Ultra."
]

In [16]:
dataset = []

for query,reference in zip(sample_queries,expected_responses):

    response = rag.query(query)
    dataset.append(
        {
            "user_input":query,
            "response":response,
            "reference":reference
        }
    )

Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.
Setting `pad_token_id` to `eos_token_id`:128001 for open-end generation.


In [17]:
for item in dataset:
  item['response'] = item['response'].response

In [18]:
!pip install nltk



In [19]:
import nltk

for item in dataset:
  BLEUscore = nltk.translate.bleu_score.sentence_bleu([item['reference'].split()], item['response'].split(), weights = [1])
  print(BLEUscore)

0.12751677852348997
0.024390243902439022
0
0.00012340980408667956
0.033898305084745756
0.11167512690355333
0.75
0.0358974358974359
0.5921653626850814
0.6096256684491979


Conclusion: The model was able to answer questions about subject matter that was outside the scope of the document. It was also able to pick up specific information from documents and generate lists.  

Credits: https://www.youtube.com/watch?v=f-AXdiCyiT8 and https://docs.llamaindex.ai/en/stable/examples/low_level/oss_ingestion_retrieval/