 # Document Summarization RAG using OpenAI GPT4 + LLAMAIndex + W&B
 
This solution outlines the development of a prototype for building a Retrieval Augmented Generation (RAG) system, focusing on summarization of a large document using OpenAI GPT, LlamaIndex and Weights & Biases (W&B).

It covers: setting up a local RAG system and tracking experiments.

## Approach

1. **Load a PDF Document Using LlamaIndex PDFReader:** The PDF document will be the external source for the LLM to be more context based. In this case, the PDF provides context on 'AI and Big Data in Finance'.
2. **Segment the large PDF to chunks and embed its contents:** This process is done using an embedding model. 
3. **Save the embedded vectors using VectorStore:** These vectors can then be queried by the LLM for enriched context.
4. **Set Up the RAG System Using LLAMAIndex ServiceContext:** ServiceContext is used to prepare the LLM for processing and retrieving information based on the embeddings generated by a pretrained embedding model.
5. **A working RAG System:** At this point, we will have a working prototype of RAG which can summarize the PDF and answer relevant queries about the PDF using the PDF's context.

<img src="img1.png">

---
##  Importing Necessary Libraries

In [1]:
#!pip install -r requirements.txt

In [1]:
# Importing required libraries

from dotenv import load_dotenv # to load an environment variable (API Key)
from llama_index.llms import OpenAI
import wandb

import os
import openai
from pathlib import Path

import warnings

from llama_index import SimpleDirectoryReader
from llama_index import VectorStoreIndex
from llama_index import ServiceContext

from llama_index.llms.llama_utils import messages_to_prompt
from llama_index.llms.llama_utils import completion_to_prompt

import copy
import random
import nest_asyncio
import pandas as pd

from llama_index import ServiceContext
from llama_index.callbacks import CallbackManager, WandbCallbackHandler

from llama_index import VectorStoreIndex, download_loader
import time

from llama_index.response.notebook_utils import display_response

In [3]:
# warnings.filterwarnings("ignore")
WANDB_PROJECT = "beakbook_project_testing_v1"

## Chunk Size 2048

In [4]:
chunk_size = 2048

 ### Loading and reading the Document
 Loading the PDFReader using llama_index

In [5]:
PDFReader = download_loader("PDFReader")
loader = PDFReader()
# documents = loader.load_data(file=Path("./olmo.pdf"))
documents = loader.load_data(file=Path("./Artificial-intelligence-machine-learning-big-data-in-finance.pdf"))

 ### Initialize Weights & Biases (W&B)
 Weights & Biases (W&B) is used for tracking experiments, visualizing data, and sharing insights. We initialize it here for our project.

In [6]:
# Initialize W&B for tracking and visualizations

wandb_args = {"project": WANDB_PROJECT, "name": "ai_finance_rag"}
wandb_callback = WandbCallbackHandler(run_args=wandb_args)
callback_manager = CallbackManager([wandb_callback])

[34m[1mwandb[0m: Streaming LlamaIndex events to W&B at https://wandb.ai/shanxnard/beakbook_project_testing_v1/runs/3qte1lao
[34m[1mwandb[0m: `WandbCallbackHandler` is currently in beta.
[34m[1mwandb[0m: Please report any issues to https://github.com/wandb/wandb/issues with the tag `llamaindex`.


## Setting up RAG

### Get the GPT 4 model for setting up the RAG system

Setting temperature 0.1 for introducing little variation in text generation

In [7]:
load_dotenv() 
openai.api_key = os.getenv(
    "OPENAI_API_KEY"
)  

# llm = OpenAI(model="gpt-3.5-turbo", temperature=0.1)
llm = OpenAI(model="gpt-4-1106-preview", temperature=0.1)

 ### Setup ServiceContext
 Setting up the ServiceContext with the language model and embedding model (for converting text into vector representations)

In [8]:
start_time = time.time()


embed_model = "local:BAAI/bge-small-en-v1.5" # 
service_context = ServiceContext.from_defaults(
    llm=llm, 
    embed_model=embed_model, 
    callback_manager=callback_manager,
    chunk_size=chunk_size
) # 20 seconds


elapsed_time = time.time() - start_time
print(elapsed_time)
wandb.log({"chunking": elapsed_time})

10.739284992218018


 ### Create Vectorized documents (from the PDF) using VectorStore

In [10]:
start_time = time.time()

index = VectorStoreIndex.from_documents(documents, service_context=service_context)

# Converting the index to a query engine for retrieval
query_engine = index.as_query_engine()
# ~1 minute

elapsed_time = time.time() - start_time
print(elapsed_time)
wandb.log({"vectorizing": elapsed_time})

[34m[1mwandb[0m: Logged trace tree to W&B.


69.94628262519836


 ### Testing the working of this RAG system 
Asking a few questions related to the loaded documents to the RAG

In [11]:
# Defining a function to display responses

def query_and_display(question):
    response = query_engine.query(question)
    display_response(response)

In [12]:
start_time = time.time()

query_and_display("Can you give me a short 6-7 lines summary of the book?")


elapsed_time = time.time() - start_time
print(elapsed_time)
wandb.log({"summarization_or_querying": elapsed_time})

[34m[1mwandb[0m: Logged trace tree to W&B.


**`Final Response:`** The book titled "Artificial Intelligence, Machine Learning, Big Data in Finance" appears to be a comprehensive resource on the intersection of advanced technologies and the financial sector. It likely covers the impact and applications of AI, machine learning, and big data analytics in finance, exploring how these technologies are transforming financial services. The content may include discussions on regulatory challenges, innovation in financial products, risk management, and the future of finance in the context of rapid technological advancements. The presence of the OECD website suggests that the book might also address international perspectives and policy considerations related to the adoption of these technologies in the finance industry.

11.923530101776123


## Chunk Size 1024

In [13]:
chunk_size=1024

In [14]:
# Initialize W&B for tracking and visualizations

wandb_args = {"project": WANDB_PROJECT, "name": "ai_finance_rag_1024"}
wandb_callback = WandbCallbackHandler(run_args=wandb_args)
callback_manager = CallbackManager([wandb_callback])

In [15]:
start_time = time.time()


embed_model = "local:BAAI/bge-small-en-v1.5" # 
service_context = ServiceContext.from_defaults(
    llm=llm, 
    embed_model=embed_model, 
    callback_manager=callback_manager,
    chunk_size=chunk_size
) # 20 seconds


elapsed_time = time.time() - start_time
print(elapsed_time)
wandb.log({"chunking": elapsed_time})

0.5027825832366943


In [16]:
start_time = time.time()


index = VectorStoreIndex.from_documents(documents, service_context=service_context)

# Converting the index to a query engine for retrieval
query_engine = index.as_query_engine()
# ~1 minute


elapsed_time = time.time() - start_time
print(elapsed_time)
wandb.log({"vectorizing": elapsed_time})

[34m[1mwandb[0m: Logged trace tree to W&B.


79.39982175827026


In [17]:
start_time = time.time()

query_and_display("Can you give me a short 6-7 lines summary of the book?")


elapsed_time = time.time() - start_time
print(elapsed_time)
wandb.log({"summarization_or_querying": elapsed_time})

[34m[1mwandb[0m: Logged trace tree to W&B.


**`Final Response:`** The book titled "Artificial Intelligence, Machine Learning, Big Data in Finance" appears to explore the intersection of advanced technologies with the financial sector. It likely discusses how artificial intelligence (AI) and machine learning (ML) are transforming financial services, from risk assessment to algorithmic trading and personalized financial guidance. The book may also delve into the implications of big data analytics in finance, examining both the opportunities for innovation and the challenges related to privacy, security, and regulation. As it is referenced by the OECD, an organization that often focuses on economic development and policy, the book might also touch on the policy aspects of these technological advancements in the finance industry.

10.518078565597534


### We can see a slight reduction in response time

In [18]:
# Closing the W&B run after queries
wandb_callback.finish()

---
## Evaluation of chunk size against metrics:

### Faithfulness

This metric evaluates the factual consistency of the generated answer with the given context. Faithfulness is measured on a scale from 0 to 1, where higher scores indicate a greater degree of consistency. An answer is considered faithful if all claims made within it can be directly inferred from the provided context.

#### Example

- **Question:** Who invented the telephone?
- **Context:** Alexander Graham Bell was a Scottish-born inventor, scientist, and engineer who is credited with inventing and patenting the first practical telephone on March 7, 1876.
- **High Faithfulness Answer:** Alexander Graham Bell, a Scottish-born inventor, is credited with inventing the first practical telephone on March 7, 1876.
- **Low Faithfulness Answer:** The telephone was invented by Alexander Graham Bell in Scotland, in April 1876.

### Relevancy

This metric assesses how relevant the generated answer is to the given question. It is calculated on a scale from 0 to 1, where higher values signify greater relevance. Answers that are incomplete or contain unnecessary information receive lower scores.

#### Example

- **Question:** What are the main ingredients in a Margherita pizza?
- **Low Relevance Answer:** A pizza typically includes dough, tomatoes, and cheese.
- **High Relevance Answer:** A Margherita pizza specifically uses fresh tomatoes, mozzarella cheese, fresh basil, salt, and extra-virgin olive oil as its main ingredients, reflecting its Italian origin.


## Other Metrics

Metrics can be categorized into three groups: Honest, Harmless, and Helpful.

### Honest Evaluations - (Faithfulness and Relevancy are categorized under this, so not including them here)

1. **No Answer Ratio:** The proportion of queries for which the RAG provides no answer, using phrases like "The question cannot be answered" or "The answer is not in the documents". It ranges between 0 (always provides an answer) and 1 (never provides an answer).
2. **Recall:** Measures the frequency with which the correct document is among those retrieved, across a set of queries. It ranges from 0 (no correct document retrieved) to 1 (all correct documents retrieved).
3. **Mean Average Precision (mAP):** The average position of correctly retrieved documents, with values ranging from 0 (no correct matches) to 1 (all top results are correct).
4. **Mean Reciprocal Rank (MRR):** Evaluates how well the system identifies the correct answer, focusing on the ranking of the first correct answer. Higher scores indicate better performance.
5. **Normalized Discounted Cumulative Gain (NDCG):** A ranking performance measure that focuses on the relevance and position of documents in search results. Scores range from 0.0 to 1.0, with 1.0 being the perfect score.

### Harmless Evaluations - Risk metrics

1. **PII Detection:** Assesses the RAG's ability to identify and secure Personally Identifiable Information, using tools for detecting and redacting sensitive information.
2. **Toxicity:** Measures the presence of offensive or harmful language in AI responses, aiming to mitigate bias and promote respectful communication.
3. **Stereotyping:** Evaluates outputs for language that may perpetuate harmful stereotypes, focusing on avoiding bias related to race, gender, age, etc.
4. **Jailbreaks:** Metrics designed to prevent AI from engaging in or promoting harmful activities, ensuring advice and actions suggested are safe and responsible.

### Helpful Evaluations

1. **Language Mismatch:** Detects and matches the language of the user's input, ensuring responses are in the same language, measured by language detection algorithms.
2. **Conciseness:** Evaluates the succinctness of AI responses, aiming for efficient communication without unnecessary verbosity.
3. **Coherence:** Measures the logical flow and structure of the AI's output, ensuring it reads naturally, often requiring human judgment for assessment.
4. **Fluency:** Assesses grammatical and syntactic correctness, with scores ranging from 1 (poor quality) to 5 (perfect grammatical correctness).


---
## Scope 
### Extension to the Current Pipeline: 
#### Risk and Security in RAG Solutions with API Gateway

- **Centralized Gateway Security:** A RAG system can be paired with an API Gateway interface for accessing a Large Language Model. For example, Amazon API Gateway enhances control and security over model access. This approach includes restricting access via API keys to minimize risks and enable secure monitoring.
