In [1]:
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2404.16130" -O "data/graphrag.pdf"
!wget --user-agent "Mozilla" "https://arxiv.org/pdf/2310.11511" -O "data/selfrag.pdf"

--2024-07-02 18:39:05--  https://arxiv.org/pdf/2404.16130
Resolving arxiv.org (arxiv.org)... 151.101.67.42, 151.101.131.42, 151.101.3.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.67.42|:443... ^C
--2024-07-02 18:39:08--  https://arxiv.org/pdf/2310.11511
Resolving arxiv.org (arxiv.org)... 151.101.67.42, 151.101.131.42, 151.101.3.42, ...
Connecting to arxiv.org (arxiv.org)|151.101.67.42|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1405127 (1.3M) [application/pdf]
Saving to: ‘data/selfrag.pdf’

data/selfrag.pdf      4%[                    ]  62.80K  4.86KB/s    eta 4m 30s 

In [3]:
# common configs
import os
import chromadb
from dotenv import load_dotenv
from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from llama_index.core.node_parser import SentenceSplitter
from llama_index.llms.openai import OpenAI
from llama_index.vector_stores.chroma import ChromaVectorStore
from llama_index.core import StorageContext, ServiceContext
from llama_index.embeddings.openai import OpenAIEmbedding

from utils_fn.helpers import print_rag

load_dotenv()

pdf_file_path = "data/graphrag.pdf"
another_pdf_file_path = "data/selfrag.pdf"
gpt35_llm = OpenAI(model="gpt-3.5-turbo", api_key=os.environ.get("OPENAI_API_KEY"))

# 1. RAG is only as good as your data

In [10]:
# parsing: most PDF parsing is inadequate
documents = SimpleDirectoryReader(input_files=[pdf_file_path]).load_data()

with open("data/graphrag_simpledirectoryreader.txt", "w") as f:
    for document in documents:
        f.write(document.text)

In [33]:
# chunking and embedding: many paramenters to optimize
db = chromadb.PersistentClient(path="data/chroma_db")
chroma_collection = db.get_or_create_collection("graphrag")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
service_context = ServiceContext.from_defaults(embed_model=OpenAIEmbedding())

index = VectorStoreIndex.from_documents(
    documents,
    transformations=[SentenceSplitter(chunk_size=256, chunk_overlap=0)],
    storage_context=storage_context,
    service_context=service_context
)

index.storage_context.persist("data/chroma_db/graphrag")

  service_context = ServiceContext.from_defaults(embed_model=OpenAIEmbedding())


In [50]:
! streamlit run utils_fn/view_chroma.py "data/chroma_db"

[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://30.222.192.144:8501[0m
[0m
[34m[1m  For better performance, install the Watchdog module:[0m

  $ xcode-select --install
  $ pip install watchdog
            [0m
Opening database: data/chroma_db
Opening database: data/chroma_db
^C
[34m  Stopping...[0m


# 2. Questions where rag will fail

## 2.1 Query one PDF

In [2]:
documents = SimpleDirectoryReader(input_files=[pdf_file_path]).load_data()
index = VectorStoreIndex.from_documents(
    documents,
    transformations=[SentenceSplitter(chunk_size=256, chunk_overlap=0)]
)

In [3]:
# summarization
print_rag(index, "Give me a summary of main points in Discussion Section")

[92mQuestion:
[0mGive me a summary of main points in Discussion Section

[92mRetrieved:
[0mNode ID: 27b14c99-9c5f-4798-894d-42a32c041798
Text: For example, a user may scan through community summaries at one
level looking for general themes of interest, then follow links to the
reports at the lower level that provide more details for each of the
subtopics. Here, however, we focus on their utility as part of a
graph-based index used for answering global queries. Community
summaries are g...
Score:  0.794


Node ID: e71528e5-433d-43b8-8f6d-40847210147a
Text: The prioritization is as follows: for each community edge in
decreasing order of combined source and target node degree (i.e.,
overall prominance), add descriptions of the source node, target node,
linked covariates, and the edge itself. •Higher-level communities . If
all element summaries fit within the token limit of the con- text
window, proc...
Score:  0.784


Node ID: 72fe94ac-6f8d-4405-96ef-85e31f209d96
Text: This approach a

In [6]:
discussion_text = """5 Discussion
Limitations of evaluation approach . Our evaluation to date has only examined a certain class of
sensemaking questions for two corpora in the region of 1 million tokens. More work is needed
to understand how performance varies across different ranges of question types, data types, and
dataset sizes, as well as to validate our sensemaking questions and target metrics with end users.
Comparison of fabrication rates, e.g., using approaches like SelfCheckGPT (Manakul et al., 2023),
would also improve on the current analysis.
Trade-offs of building a graph index . We consistently observed Graph RAG achieve the best head-
to-head results against other methods, but in many cases the graph-free approach to global summa-
rization of source texts performed competitively. The real-world decision about whether to invest in
building a graph index depends on multiple factors, including the compute budget, expected number
of lifetime queries per dataset, and value obtained from other aspects of the graph index (including
the generic community summaries and the use of other graph-related RAG approaches).
Future work . The graph index, rich text annotations, and hierarchical community structure support-
ing the current Graph RAG approach offer many possibilities for refinement and adaptation. This
includes RAG approaches that operate in a more local manner, via embedding-based matching of
user queries and graph annotations, as well as the possibility of hybrid RAG schemes that combine
embedding-based matching against community reports before employing our map-reduce summa-
rization mechanisms. This “roll-up” operation could also be extended across more levels of the
community hierarchy, as well as implemented as a more exploratory “drill down” mechanism that
follows the information scent contained in higher-level community summaries.
"""

print(gpt35_llm.complete(f"Give me a summary of main points of following text: \n {discussion_text}").text)

The text discusses the limitations of the evaluation approach used, suggesting that more work is needed to understand performance variations across different types of questions, data, and dataset sizes. It also mentions the potential benefits of comparing fabrication rates using approaches like SelfCheckGPT. The text also highlights the trade-offs of building a graph index, noting that while Graph RAG consistently performed well, a graph-free approach to summarization was competitive in some cases. The decision to invest in a graph index depends on factors such as compute budget and expected number of queries. Future work could involve refining the current Graph RAG approach, exploring local matching of user queries and graph annotations, and implementing hybrid RAG schemes. Additionally, the text suggests extending the "roll-up" operation across more levels of the community hierarchy and implementing a more exploratory "drill down" mechanism.


In [11]:
# comparision
print_rag(index, "Compare the advantages and disadvantages of the proposed method with traditional RAG")

[92mQuestion:
[0mCompare the advantages and disadvantages of the proposed method with traditional RAG

[92mRetrieved:
[0mNode ID: 50fb98bf-e2ff-441c-9ed5-a87890072a7d
Text: Empowerment comparisons showed mixed results for both global
approaches versus na¨ıve RAG ( SS) and Graph RAG approaches versus
source text summarization ( TS). Ad-hoc LLM use to analyze LLM
reasoning for this measure indicated that the ability to provide
specific exam- ples, quotes, and citations was judged to be key to
helping users reach an i...
Score:  0.849


Node ID: 59ac5174-217d-46b0-a9c4-4cd58f058779
Text: Comparison of fabrication rates, e.g., using approaches like
SelfCheckGPT (Manakul et al., 2023), would also improve on the current
analysis. Trade-offs of building a graph index . We consistently
observed Graph RAG achieve the best head- to-head results against
other methods, but in many cases the graph-free approach to global
summa- rization o...
Score:  0.835


Node ID: 0c8dbdb2-d3fb-4922-ab3d-f458

In [16]:
print_rag(index, "compare the results generated by Graph RAG and Naive RAG in Table 2")

[92mQuestion:
[0mcompare the results generated by Graph RAG and Naive RAG in Table 2

[92mRetrieved:
[0mNode ID: eb88f4a1-2a3c-422b-80f9-c6d8bba96956
Text: na ¨ıve RAG . As shown in Figure 4, global approaches
consistently out- performed the na ¨ıve RAG ( SS) approach in both
comprehensiveness and diversity metrics across datasets. Specifically,
global approaches achieved comprehensiveness win rates between 72-83%
for Podcast transcripts and 72-80% for News articles, while diversity
win rates range...
Score:  0.860


Node ID: 56ff2d23-c43f-475e-be96-1aaff2a3f096
Text: Answer 1, while comprehensive, includes a lot of detailed
information about various figures in different sectors of
entertainment, which, while informative, does not directly answer the
question with the same level of conciseness and specificity as Answer
2. Table 2: Example question for the News article dataset, with
generated answers from Grap...
Score:  0.848


Node ID: 50fb98bf-e2ff-441c-9ed5-a87890072a7d
Text: Em

In [12]:
# General Multi-part Question
print_rag(index, "Tell me about the drawbacks about RAG, and tell me about existing methods in advanced RAG, and then generate your own conclusion about the unique contribution of the proposed method.")

[92mQuestion:
[0mTell me about the drawbacks about RAG, and tell me about existing methods in advanced RAG, and then generate your own conclusion about the unique contribution of the proposed method.

[92mRetrieved:
[0mNode ID: 0c8dbdb2-d3fb-4922-ab3d-f4586c3a7718
Text: More advanced variations exist, but all solve the problem of
what to do when an external dataset of interest exceeds the LLM’s
context window. Advanced RAG systems include pre-retrieval, retrieval,
post-retrieval strategies designed to over- come the drawbacks of Na
¨ıve RAG, while Modular RAG systems include patterns for iterative and
dynamic c...
Score:  0.846


Node ID: 50fb98bf-e2ff-441c-9ed5-a87890072a7d
Text: Empowerment comparisons showed mixed results for both global
approaches versus na¨ıve RAG ( SS) and Graph RAG approaches versus
source text summarization ( TS). Ad-hoc LLM use to analyze LLM
reasoning for this measure indicated that the ability to provide
specific exam- ples, quotes, and citations was jud

## 2.2 Query two PDFs

In [5]:
twp_documents = SimpleDirectoryReader(input_files=[pdf_file_path, another_pdf_file_path]).load_data()
two_doc_index = VectorStoreIndex.from_documents(
    twp_documents,
    transformations=[SentenceSplitter(chunk_size=256, chunk_overlap=0)]
)

In [10]:
# comparision
print_rag(two_doc_index, "Compare the advantages and disadvantages of Graph RAG with Self-RAG")

[92mQuestion:
[0mCompare the advantages and disadvantages of Graph RAG with Self-RAG

[92mRetrieved:
[0mNode ID: de337a7b-7bf1-4502-9c93-5cf68959997f
Text: Comparison of fabrication rates, e.g., using approaches like
SelfCheckGPT (Manakul et al., 2023), would also improve on the current
analysis. Trade-offs of building a graph index . We consistently
observed Graph RAG achieve the best head- to-head results against
other methods, but in many cases the graph-free approach to global
summa- rization o...
Score:  0.864


Node ID: 66e17277-c312-4713-89ae-b48b0680045e
Text: Empowerment comparisons showed mixed results for both global
approaches versus na¨ıve RAG ( SS) and Graph RAG approaches versus
source text summarization ( TS). Ad-hoc LLM use to analyze LLM
reasoning for this measure indicated that the ability to provide
specific exam- ples, quotes, and citations was judged to be key to
helping users reach an i...
Score:  0.858


Node ID: 768b0b36-b44d-44bd-8e7d-c9d419815ca6
Text: Mo

In [12]:
# General Multi-part Question
print_rag(two_doc_index, "Tell me about the drawbacks of Graph RAG, and tell me about advantages of Self-RAG, and then generate your own conclusion about the relationship of the two methods.")

[92mQuestion:
[0mTell me about the drawbacks of Graph RAG, and tell me about advantages of Self-RAG, and then generate your own conclusion about the relationship of the two methods.

[92mRetrieved:
[0mNode ID: 768b0b36-b44d-44bd-8e7d-c9d419815ca6
Text: More advanced variations exist, but all solve the problem of
what to do when an external dataset of interest exceeds the LLM’s
context window. Advanced RAG systems include pre-retrieval, retrieval,
post-retrieval strategies designed to over- come the drawbacks of Na
¨ıve RAG, while Modular RAG systems include patterns for iterative and
dynamic c...
Score:  0.852


Node ID: de337a7b-7bf1-4502-9c93-5cf68959997f
Text: Comparison of fabrication rates, e.g., using approaches like
SelfCheckGPT (Manakul et al., 2023), would also improve on the current
analysis. Trade-offs of building a graph index . We consistently
observed Graph RAG achieve the best head- to-head results against
other methods, but in many cases the graph-free approach to g