# Knowledge Graph RAG

<img src="./media/graph_start.png" width=600>

*[Improving Knowledge Graph Completion with Generative LM and neighbors](https://deeppavlov.ai/research/tpost/bn15u1y4v1-improving-knowledge-graph-completion-wit)*

In the evolving landscape of AI and information retrieval, knowledge graphs have emerged as a powerful way to represent complex, interconnected information. A knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities. [Source: Wikipedia](https://en.wikipedia.org/wiki/Knowledge_graph)

What makes knowledge graphs particularly powerful is their ability to mirror human cognition in data. They more explicitly map the relationships between objects, concepts, or ideas together through both their semantic and relational connections. This approach closely parallels how our brains naturally understand and internalize information – not as isolated facts, but as a web of interconnected concepts and relationships.

<img src="./media/coffee_graph_ex.png" width=400>

Looking at a concept like "coffee," we don't just know it's a beverage; we automatically connect it to related concepts like beans, brewing methods, caffeine, morning routines, and social interactions. Knowledge graphs capture these natural associations in a structured way.

Traditional RAG systems, while effective at semantic similarity-based retrieval, often struggle to capture broader conceptual relationships across text chunks. Knowledge Graph RAG addresses this limitation by introducing a structured, hierarchical approach to information organization and retrieval. By representing data in a graph format, these systems can traverse relationships between concepts, enabling more sophisticated query understanding and response generation. This approach allows for targeted querying along specific relationship paths, handles complex multi-hop questions, and provides clearer reasoning through explicit connection paths. The result is a more nuanced and interpretable system that combines the structured reasoning of knowledge graphs with the natural language capabilities of large language models.

While [knowledge graphs are not a new concept](https://blog.google/products/search/introducing-knowledge-graph-things-not/), their creation has traditionally been a resource-intensive process. Early knowledge graphs were built either through manual curation by domain experts or by converting existing structured data from relational databases. This limited both their scale and adaptability to new domains.

<img src="./media/table_comp.png" width=600>

*[What is a Knowledge Graph (KG)?](https://zilliz.com/learn/what-is-knowledge-graph)*

The introduction of LLMs has transformed this landscape. LLMs' capabilities in NLP, reasoning, and relationship extraction now enable automated construction of knowledge graphs from unstructured text. These models can identify entities, infer relationships, and structure information in ways that previously required extensive manual labor. As a plus, this allows knowledge graphs to be dynamically updated and expanded as new information becomes available, making them more practical and scalable for real-world applications.

To see this in action ourselves, and compare it to traditional vector similarity techniques, we'll take a look at Microsoft's Open Source [GraphRAG](https://microsoft.github.io/graphrag/) and how it works behind the scenes.

---
## 3 Main Components of Knowledge Graphs

**Entity**

<img src="./media/entities.png" width=500>

An Entity is a distinct object, person, place, event, or concept that has been extracted from a chunk of text through LLM analysis. Entities form the nodes of the knowledge graph. During the creation of the knowledge graph, when duplicate entities are found they are merged while preserving their various descriptions, creating a comprehensive representation of each unique entity.

**Relationship**

<img src="./media/relationship.png" width=400>

A Relationship defines a connection between two entities in the knowledge graph. These connections are extracted directly from text units through LLM analysis, alongside entities. Each relationship includes a source entity, target entity, and descriptive information about their connection. When duplicate relationships are found between the same entities, they are merged by combining their descriptions to create a more complete understanding of the connection.

**Community**

<img src="./media/communities.png" width=400>

A Community is a cluster of related entities and relationships identified through hierarchical community detection, generally using the [Leiden Algorithm](https://en.wikipedia.org/wiki/Leiden_algorithm). Communities create a structured way to understand different levels of granularity within the knowledge graph, from broad overviews at the top level to detailed local clusters at lower levels. This hierarchical structure helps in organizing and navigating complex knowledge graphs.

---
## GraphRAG Creation Data Flow

<img src=./media/graph_building.png width=1000>

Indexxing in GraphRAG is an extensive process, where we load the document, split it into chunks, create sub graphs at a chunk level, combine these subgraphs into our final graph, algorithmically identify communities, then document the communities main features.

### **Loading and Splitting Our Text**

For our example, we'll be using [The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities](https://arxiv.org/pdf/2408.13296).

This will be loaded as a text file (remove index, glossary, and references) and split into 1200 token, 100 token overlap chunks.

In [1]:
from langchain_text_splitters import TokenTextSplitter

with open("./paper/input/2408.13296v3.txt", 'r') as file:
    content = file.read()

text_splitter = TokenTextSplitter(chunk_size=1200, chunk_overlap=100)

texts = text_splitter.split_text(content)

In [2]:
print(f"Split into {len(texts)} documents.")
print(texts[0])

Split into 46 documents.
The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities

(Version 1.1)

Venkatesh Balavadhani Parthasarathy, Ahtsham Zafar, Aafaq Khan, and Arsalan Shahid

@ CeADAR Connect Group

CeADAR: Ireland's Centre for AI, University College Dublin, Belfield, Dublin, Ireland { venkatesh.parthasarathy, ahtsham.zafar, aafaq.khan, arsalan.shahid } @ ucd.ie

October 2024

Abstract

This technical report thoroughly examines the process of fine-tuning Large Language Models (LLMs), integrating theoretical insights and practical applications. It begins by tracing the historical development of LLMs, emphasising their evolution from traditional Natural Language Processing (NLP) models and their pivotal role in modern AI systems. The analysis differentiates between various fine-tuning methodologies, including supervised, unsupervised, and instruction-based appr

**Entity and Relationship Extraction Prompt**

This is a [tuned](https://microsoft.github.io/graphrag/prompt_tuning/auto_prompt_tuning/) entity extraction prompt used in our real GraphRAG implementation, extracted in this format to see what's happening.

In [3]:
!pip install langchain-openai python-dotenv -q

In [4]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
# from langchain_openai import ChatOpenAI
from langchain_openai import ChatOpenAI

from dotenv import load_dotenv

load_dotenv()


llm = ChatOpenAI(temperature=0.0, model="gpt-4o-mini")

prompt_template = """
-Goal-
Given a text document that is potentially relevant to this activity and a list of entity types, identify all entities of those types from the text and all relationships among the identified entities.

-Steps-
1. Identify all entities. For each identified entity, extract the following information:
- entity_name: Name of the entity, capitalized
- entity_type: One of the following types: [large language model, differential privacy, federated learning, healthcare, adversarial training, security measures, open-source tool, dataset, learning rate, AdaGrad, RMSprop, adapter architecture, LoRA, API, model support, evaluation metrics, deployment, Python library, hardware accelerators, hyperparameters, data preprocessing, data imbalance, GPU-based deployment, distributed inference]
- entity_description: Comprehensive description of the entity's attributes and activities
Format each entity as ("entity"{{tuple_delimiter}}<entity_name>{{tuple_delimiter}}<entity_type>{{tuple_delimiter}}<entity_description>)

2. From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly related* to each other.
For each pair of related entities, extract the following information:
- source_entity: name of the source entity, as identified in step 1
- target_entity: name of the target entity, as identified in step 1
- relationship_description: explanation as to why you think the source entity and the target entity are related to each other
- relationship_strength: an integer score between 1 to 10, indicating strength of the relationship between the source entity and target entity
Format each relationship as ("relationship"{{tuple_delimiter}}<source_entity>{{tuple_delimiter}}<target_entity>{{tuple_delimiter}}<relationship_description>{{tuple_delimiter}}<relationship_strength>)

3. Return output in The primary language of the provided text is "English." as a single list of all the entities and relationships identified in steps 1 and 2. Use **{{record_delimiter}}** as the list delimiter.

4. If you have to translate into The primary language of the provided text is "English.", just translate the descriptions, nothing else!

5. When finished, output {{completion_delimiter}}.

-Examples-
######################

Example 1:

entity_types: [large language model, differential privacy, federated learning, healthcare, adversarial training, security measures, open-source tool, dataset, learning rate, AdaGrad, RMSprop, adapter architecture, LoRA, API, model support, evaluation metrics, deployment, Python library, hardware accelerators, hyperparameters, data preprocessing, data imbalance, GPU-based deployment, distributed inference]
text:
 LLMs to create synthetic samples that mimic clients’ private data distribution using
differential privacy. This approach significantly boosts SLMs’ performance by approximately 5% while
maintaining data privacy with a minimal privacy budget, outperforming traditional methods relying
solely on local private data.
In healthcare, federated fine-tuning can allow hospitals to collaboratively train models on patient data
without transferring sensitive information. This approach ensures data privacy while enabling the de-
velopment of robust, generalisable AI systems.
8https://ai.meta.com/responsible-ai/
9https://huggingface.co/docs/hub/en/model-cards
10https://www.tensorflow.org/responsible_ai/privacy/guide
101 Frameworks for Enhancing Security
Adversarial training and robust security measures[111] are essential for protecting fine-tuned models
against attacks. The adversarial training approach involves training models with adversarial examples
to improve their resilience against malicious inputs. Microsoft Azure’s
------------------------
output:
("entity"{{tuple_delimiter}}DIFFERENTIAL PRIVACY{{tuple_delimiter}}differential privacy{{tuple_delimiter}}Differential privacy is a technique used to create synthetic samples that mimic clients' private data distribution while maintaining data privacy with a minimal privacy budget{{record_delimiter}}
("entity"{{tuple_delimiter}}HEALTHCARE{{tuple_delimiter}}healthcare{{tuple_delimiter}}In healthcare, federated fine-tuning allows hospitals to collaboratively train models on patient data without transferring sensitive information, ensuring data privacy{{record_delimiter}}
("entity"{{tuple_delimiter}}FEDERATED LEARNING{{tuple_delimiter}}federated learning{{tuple_delimiter}}Federated learning is a method that enables collaborative model training on decentralized data sources, such as hospitals, without sharing sensitive information{{record_delimiter}}
("entity"{{tuple_delimiter}}ADVERSARIAL TRAINING{{tuple_delimiter}}adversarial training{{tuple_delimiter}}Adversarial training involves training models with adversarial examples to improve their resilience against malicious inputs{{record_delimiter}}
("entity"{{tuple_delimiter}}SECURITY MEASURES{{tuple_delimiter}}security measures{{tuple_delimiter}}Robust security measures are essential for protecting fine-tuned models against attacks{{record_delimiter}}
("relationship"{{tuple_delimiter}}DIFFERENTIAL PRIVACY{{tuple_delimiter}}FEDERATED LEARNING{{tuple_delimiter}}Differential privacy is used in federated learning to maintain data privacy while training models collaboratively{{tuple_delimiter}}8{{record_delimiter}}
("relationship"{{tuple_delimiter}}HEALTHCARE{{tuple_delimiter}}FEDERATED LEARNING{{tuple_delimiter}}Federated learning is applied in healthcare to train models on patient data without transferring sensitive information{{tuple_delimiter}}9{{record_delimiter}}
("relationship"{{tuple_delimiter}}ADVERSARIAL TRAINING{{tuple_delimiter}}SECURITY MEASURES{{tuple_delimiter}}Adversarial training is a security measure used to protect models against attacks by improving their resilience{{tuple_delimiter}}8{{completion_delimiter}}
#############################


Example 2:

entity_types: [large language model, differential privacy, federated learning, healthcare, adversarial training, security measures, open-source tool, dataset, learning rate, AdaGrad, RMSprop, adapter architecture, LoRA, API, model support, evaluation metrics, deployment, Python library, hardware accelerators, hyperparameters, data preprocessing, data imbalance, GPU-based deployment, distributed inference]
text:
ARD [82] is an innovative open-source tool developed to enhance the safety of interactions
with large language models (LLMs). This tool addresses three critical moderation tasks: detecting
2https://huggingface.co/docs/transformers/en/model_doc/auto#transformers.AutoModelForCausalLM
63 harmful intent in user prompts, identifying safety risks in model responses, and determining when a
model appropriately refuses unsafe requests. Central to its development is WILDGUARD MIX3, a
meticulously curated dataset comprising 92,000 labelled examples that include both benign prompts and
adversarial attempts to bypass safety measures. The dataset is divided into WILDGUARD TRAIN, used
for training the model, and WILDGUARD TEST, consisting of high-quality human-annotated examples
for evaluation.
The WILDGUARD model itself is fine-tuned on the Mistral-7B language model using the WILDGUARD
TRAIN dataset, enabling it to perform all
------------------------
output:
```plaintext
("entity"{{tuple_delimiter}}ARD{{tuple_delimiter}}open-source tool{{tuple_delimiter}}ARD is an innovative open-source tool developed to enhance the safety of interactions with large language models by addressing moderation tasks such as detecting harmful intent, identifying safety risks, and determining appropriate refusals of unsafe requests)
{{record_delimiter}}
("entity"{{tuple_delimiter}}LARGE LANGUAGE MODELS{{tuple_delimiter}}large language model{{tuple_delimiter}}Large language models (LLMs) are advanced AI models designed to understand and generate human-like text, which ARD aims to interact with safely)
{{record_delimiter}}
("entity"{{tuple_delimiter}}WILDGUARD MIX3{{tuple_delimiter}}dataset{{tuple_delimiter}}WILDGUARD MIX3 is a meticulously curated dataset comprising 92,000 labeled examples, including benign prompts and adversarial attempts, used for training and evaluating safety measures in language models)
{{record_delimiter}}
("entity"{{tuple_delimiter}}WILDGUARD TRAIN{{tuple_delimiter}}dataset{{tuple_delimiter}}WILDGUARD TRAIN is a subset of the WILDGUARD MIX3 dataset used specifically for training the model on safety measures)
{{record_delimiter}}
("entity"{{tuple_delimiter}}WILDGUARD TEST{{tuple_delimiter}}dataset{{tuple_delimiter}}WILDGUARD TEST is a subset of the WILDGUARD MIX3 dataset consisting of high-quality human-annotated examples used for evaluating the model's performance)
{{record_delimiter}}
("entity"{{tuple_delimiter}}MISTRAL-7B{{tuple_delimiter}}large language model{{tuple_delimiter}}Mistral-7B is a language model that the WILDGUARD model is fine-tuned on using the WILDGUARD TRAIN dataset to enhance its safety performance)
{{record_delimiter}}
("entity"{{tuple_delimiter}}ADVERSARIAL ATTEMPTS{{tuple_delimiter}}adversarial training{{tuple_delimiter}}Adversarial attempts are part of the WILDGUARD MIX3 dataset, used to test and improve the model's ability to handle unsafe or harmful inputs)
{{record_delimiter}}
("entity"{{tuple_delimiter}}SAFETY MEASURES{{tuple_delimiter}}security measures{{tuple_delimiter}}Safety measures are protocols and techniques implemented to ensure that large language models interact safely with users, which ARD and the WILDGUARD dataset aim to enhance)
{{record_delimiter}}
("relationship"{{tuple_delimiter}}ARD{{tuple_delimiter}}LARGE LANGUAGE MODELS{{tuple_delimiter}}ARD is designed to enhance the safety of interactions with large language models by addressing critical moderation tasks{{tuple_delimiter}}8)
{{record_delimiter}}
("relationship"{{tuple_delimiter}}ARD{{tuple_delimiter}}WILDGUARD MIX3{{tuple_delimiter}}ARD uses the WILDGUARD MIX3 dataset to train and evaluate its moderation capabilities{{tuple_delimiter}}7)
{{record_delimiter}}
("relationship"{{tuple_delimiter}}WILDGUARD MIX3{{tuple_delimiter}}WILDGUARD TRAIN{{tuple_delimiter}}WILDGUARD TRAIN is a subset of the WILDGUARD MIX3 dataset used for training{{tuple_delimiter}}9)
{{record_delimiter}}
("relationship"{{tuple_delimiter}}WILDGUARD MIX3{{tuple_delimiter}}WILDGUARD TEST{{tuple_delimiter}}WILDGUARD TEST is a subset of the WILDGUARD MIX3 dataset used for evaluation{{tuple_delimiter}}9)
{{record_delimiter}}
("relationship"{{tuple_delimiter}}WILDGUARD TRAIN{{tuple_delimiter}}MISTRAL-7B{{tuple_delimiter}}The WILDGUARD TRAIN dataset is used to fine-tune the Mistral-7B language model{{tuple_delimiter}}8)
{{record_delimiter}}
("relationship"{{tuple_delimiter}}ADVERSARIAL ATTEMPTS{{tuple_delimiter}}SAFETY MEASURES{{tuple_delimiter}}Adversarial attempts are used to test and improve safety measures in language models{{tuple_delimiter}}7)
{{completion_delimiter}}
```
#############################



-Real Data-
######################
entity_types: [large language model, differential privacy, federated learning, healthcare, adversarial training, security measures, open-source tool, dataset, learning rate, AdaGrad, RMSprop, adapter architecture, LoRA, API, model support, evaluation metrics, deployment, Python library, hardware accelerators, hyperparameters, data preprocessing, data imbalance, GPU-based deployment, distributed inference]
text: {input_text}
######################
output:
"""

prompt = ChatPromptTemplate.from_template(prompt_template)

chain = prompt | llm | StrOutputParser()

**Creating a Response**

In [5]:
response = chain.invoke({"input_text": texts[25]})

In [6]:
print(response)

```plaintext
("entity"{tuple_delimiter}DEPLOYMENT ENVIRONMENT{tuple_delimiter}deployment{tuple_delimiter}The deployment environment includes necessary hardware, cloud services, and containerization tools required for deploying models in production.)
{record_delimiter}
("entity"{tuple_delimiter}API DEVELOPMENT{tuple_delimiter}API{tuple_delimiter}API development involves creating APIs that allow applications to interact with models, facilitating prediction requests and responses.)
{record_delimiter}
("entity"{tuple_delimiter}CLOUD-BASED LARGE LANGUAGE MODEL INFERENCING{tuple_delimiter}large language model{tuple_delimiter}Cloud-based large language model inferencing uses a pricing model based on the number of tokens processed, charging users according to the volume of text analyzed or generated.)
{record_delimiter}
("entity"{tuple_delimiter}SELF-HOSTING{tuple_delimiter}deployment{tuple_delimiter}Self-hosting allows organizations to manage their own infrastructure for LLM solutions, provid

We see the extraction of **entities**:

`("entity"{tuple_delimiter}EVALUATION METRICS{tuple_delimiter}evaluation metrics{tuple_delimiter}Evaluation metrics are criteria  used to assess the performance of AI models, including metrics like cross-entropy, perplexity, factuality, and context relevance)`

As well as **relationships**:

`("relationship"{tuple_delimiter}EVALUATION METRICS{tuple_delimiter}CONTEXT RELEVANCE{tuple_delimiter}Context relevance is an evaluation metric that ensures the model uses the most pertinent information for generating responses{tuple_delimiter}8)`

Following this, these per chunk subgraphs are merged together - any entities with the same name and type are merged by creating an array of their descriptions. Similarly, any relationships with the same source and target are merged by creating an array of their descriptions. These lists are then summarized one more time 

### **Looking at Final Entities and Relationships**

In [9]:
import pandas as pd

entities = pd.read_parquet('./paper/output/entities.parquet')

entities.head(20)

Unnamed: 0,id,human_readable_id,title,type,description,text_unit_ids,frequency,degree,x,y
0,3722afa2-33b1-4093-ab3d-b0d658137ea9,0,VENKATESH BALAVADHANI PARTHASARATHY,PERSON,Venkatesh Balavadhani Parthasarathy is one of ...,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...,1,1,0.0,0.0
1,6871dfbf-33ec-473a-aa6e-f59558cf246a,1,AHTSHAM ZAFAR,PERSON,Ahtsham Zafar is one of the authors of the tec...,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...,1,1,0.0,0.0
2,d65f2b8f-4032-464f-9546-5021972c5cd7,2,AAFAQ KHAN,PERSON,Aafaq Khan is one of the authors of the techni...,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...,1,1,0.0,0.0
3,c8cacad0-f473-4807-a0c7-58131731db51,3,ARSALAN SHAHID,PERSON,Arsalan Shahid is one of the authors of the te...,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...,1,1,0.0,0.0
4,79fc51c4-ee7a-4bd5-aa1e-77d4747a67d0,4,CEADAR CONNECT GROUP,ORGANIZATION,CeADAR Connect Group is associated with the re...,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...,1,6,0.0,0.0
5,9a9cd77f-02b7-44c1-9da8-c1be3db76aa9,5,UNIVERSITY COLLEGE DUBLIN,ORGANIZATION,University College Dublin is the institution w...,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...,1,2,0.0,0.0
6,dafa263c-1e6d-4160-8250-1c8e9a39cf68,6,DUBLIN,GEO,"Dublin is the capital city of Ireland, where U...",[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...,1,0,,
7,eb0140ee-158a-444f-a22f-8a3b3070a546,7,FINE-TUNING OF LLMS,EVENT,The fine-tuning of Large Language Models (LLMs...,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...,1,2,0.0,0.0
8,acd70242-033f-49a7-8792-7b9f1b351e44,8,NATURAL LANGUAGE PROCESSING,EVENT,Natural Language Processing (NLP) is a special...,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...,2,1,0.0,0.0
9,17414fae-ee97-4ca5-8f56-14263ce28e7b,9,GPT-3,EVENT,GPT-3 is a state-of-the-art language model dev...,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...,2,4,0.0,0.0


In [10]:
relationships = pd.read_parquet('./paper/output/relationships.parquet')

relationships.head(20)

Unnamed: 0,id,human_readable_id,source,target,description,weight,combined_degree,text_unit_ids
0,66aeff94-c931-47b9-8d05-7a144e1bffed,0,VENKATESH BALAVADHANI PARTHASARATHY,CEADAR CONNECT GROUP,Venkatesh Balavadhani Parthasarathy is affilia...,8.0,7,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...
1,0af75879-f95d-429e-a087-50dd3a55b8fa,1,AHTSHAM ZAFAR,CEADAR CONNECT GROUP,Ahtsham Zafar is affiliated with the CeADAR Co...,8.0,7,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...
2,129a277b-6f38-4d5b-b432-7b66bf4828be,2,AAFAQ KHAN,CEADAR CONNECT GROUP,Aafaq Khan is affiliated with the CeADAR Conne...,8.0,7,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...
3,21b5fc78-0590-4603-846e-e1656fc7412d,3,ARSALAN SHAHID,CEADAR CONNECT GROUP,Arsalan Shahid is affiliated with the CeADAR C...,8.0,7,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...
4,4c4fd723-81e6-4196-889a-660da3be2fd7,4,CEADAR CONNECT GROUP,UNIVERSITY COLLEGE DUBLIN,CeADAR Connect Group is part of University Col...,9.0,8,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...
5,134dd6fb-b8d3-4061-be15-3de126e2556d,5,CEADAR CONNECT GROUP,FINE-TUNING OF LLMS,The CeADAR Connect Group is involved in the re...,7.0,8,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...
6,3cd1d3f1-69f6-4194-bce6-037cb8998805,6,UNIVERSITY COLLEGE DUBLIN,FINE-TUNING OF LLMS,University College Dublin is the institution w...,1.0,4,[e23f40dc2b1f2299af343eaf1bb144930cb2b5047b7cd...
7,a5f00fd5-d21d-4698-9c32-0c1c0c72125a,7,GPT-2,BERT,Both GPT-2 and BERT are pre-trained language m...,7.0,5,[e85559cf1d772032514c0673126bacb711eda0989e153...
8,4d02bca4-03f9-48d3-8a27-c2d9bfc237e0,8,GPT-2,OPENAI,"OpenAI developed the GPT-2 language model, whi...",9.0,19,[e85559cf1d772032514c0673126bacb711eda0989e153...
9,33f37318-a466-48ea-8ab5-4cba22288d22,9,BERT,GOOGLE,"Google developed BERT, a pre-trained language ...",9.0,4,[e85559cf1d772032514c0673126bacb711eda0989e153...


### **Community Detection & Node Embedding**

<img src="./media/leidan.png" width=600>

After we have our basic graph with entities and relationships, we analyze its structure in two ways. Community Detection uses the [Leiden algorithm](https://en.wikipedia.org/wiki/Leiden_algorithm) to find explicit groupings in the graph, creating a hierarchy of related entities. The lower in the hierarchy, the more granular the community. Node Embedding uses [Node2Vec](https://arxiv.org/abs/1607.00653) to create vector representations of each entity, capturing implicit relationships in the graph structure. These complementary approaches let us understand both obvious connections through communities and subtle patterns through embeddings.

Combining all of this with our relationships gives us our final nodes.

In [39]:
# nodes = pd.read_parquet('./paper/output/create_final_nodes.parquet')

# nodes.head(10)

At this step the graph is effectively created, however we can introduce a few extra steps that will allow us to do some advanced retrieval.

### Community Report Generation & Summarization

Now that we have clear community grouping, we can aggregate the main concepts across hierarchical node communities with another generation step, and a shorthand summary of that summary. Similar to the nodes, these summaries are also ran through an embedding model and stored in a vector store.

In [17]:
community_reports = pd.read_parquet('./paper/output/community_reports.parquet')

community_reports.head()

Unnamed: 0,id,human_readable_id,community,level,parent,children,title,summary,full_content,rank,rating_explanation,findings,full_content_json,period,size
0,082bb8c74db541a48fb16acfe85e3c67,72,72,2,21,[],PyTorch and Large Language Models Community,"The community centers around PyTorch, a leadin...",# PyTorch and Large Language Models Community\...,8.0,The impact severity rating is high due to the ...,[{'explanation': 'PyTorch is a widely used ope...,"{\n ""title"": ""PyTorch and Large Language Mo...",2025-12-14,26
1,cf5a75cc720e4ed29547f2c43dd86ea0,73,73,2,21,[],Validation Loops and Trends in Validation Metrics,The community focuses on the relationship betw...,# Validation Loops and Trends in Validation Me...,7.0,The impact severity rating is high due to the ...,[{'explanation': 'Validation Loops are critica...,"{\n ""title"": ""Validation Loops and Trends i...",2025-12-14,2
2,a91f21e9e4ed4ab5a0e927b72f1b10b2,74,74,2,31,[],HuggingFace and AI Safety Models,"The community centers around HuggingFace, a le...",# HuggingFace and AI Safety Models\n\nThe comm...,8.5,The impact severity rating is high due to the ...,[{'explanation': 'HuggingFace is a prominent c...,"{\n ""title"": ""HuggingFace and AI Safety Mod...",2025-12-14,8
3,0a4ce9c567044ee5a1fd0ca1563030ba,75,75,2,31,[],QLoRA and Its Applications in NLP,"The community centers around QLoRA, an advance...",# QLoRA and Its Applications in NLP\n\nThe com...,7.5,The impact severity rating is high due to the ...,[{'explanation': 'QLoRA is a significant advan...,"{\n ""title"": ""QLoRA and Its Applications in...",2025-12-14,3
4,c7c398031b234d9ca0a0ec4986978cf2,19,19,1,0,[],PagerDuty and Alerting Systems Community,"The community centers around PagerDuty, an inc...",# PagerDuty and Alerting Systems Community\n\n...,7.5,The impact severity rating is high due to the ...,[{'explanation': 'PagerDuty serves as a crucia...,"{\n ""title"": ""PagerDuty and Alerting System...",2025-12-14,3


In [18]:
print(community_reports["full_content"][0])

# PyTorch and Large Language Models Community

The community centers around PyTorch, a leading machine learning library, and its relationship with Large Language Models (LLMs) and associated entities like Hugging Face and Meta. These entities collaborate to advance the development and ethical deployment of AI technologies, particularly in natural language processing.

## PyTorch as a foundational framework for LLMs

PyTorch is a widely used open-source machine learning library that plays a crucial role in the development and training of Large Language Models (LLMs). It provides a flexible platform that supports both the initialization and fine-tuning of these models, making it integral to the field of natural language processing. The framework's capabilities enable researchers and developers to create sophisticated LLMs that can understand and generate human-like text, which is essential for various applications, from chatbots to advanced AI systems. This foundational role underscores 

In [19]:
print(community_reports["summary"][0])

The community centers around PyTorch, a leading machine learning library, and its relationship with Large Language Models (LLMs) and associated entities like Hugging Face and Meta. These entities collaborate to advance the development and ethical deployment of AI technologies, particularly in natural language processing.


### The Final Graph!

<img src="./media/ghraphrag_viz.svg" width=800>

*[Full Size PDF](./ghraphrag_viz.pdf)*

---

## GraphRAG Retrieval

<img src="./media/kg_retrieval.png" width=600>

*[Unifying Large Language Models and Knowledge Graphs: A Roadmap](https://arxiv.org/pdf/2306.08302)*

With our knowledge graph constructed, and hierarchichal communities delineated, we can now perform multiple types of search that can both take advantage of the graph structure, and multiple levels of specificity across our communities. Specifically:

1. **Global Search**: Uses the LLM Generated community reports from a specified level of the graph's community hierarchy as context data to generate response.
2. **Local Search**: Combines structured data from the knowledge graph with unstructured data from the input document(s) to augment the LLM context with relevant entity information.
3. **Drift Search**: Dynamic Reasoning and Inference with Flexible Traversal, an approach to local search queries by including community information in the search process, thus combining global and local search.

**GraphRAG Retrieval Function**

*Note: Wrapping the [GraphRAG CLI tool](https://microsoft.github.io/graphrag/cli/) as a function here instead of using their [library](https://microsoft.github.io/graphrag/examples_notebooks/api_overview/) for an easier example. As such, notebook needs to be running in the same GraphRAG environment/kernal.*

In [24]:
import subprocess
import shlex
from typing import Optional

def query_graphrag(
    query: str,
    method: str = "global",
    root_path: str = "./paper",
    timeout: Optional[int] = None,
    community_level: int = 2,
    dynamic_community_selection: bool = False
) -> str:
    """
    Execute a GraphRAG query using the CLI tool.
    
    Args:
        query (str): The query string to process
        method (str): Query method (e.g., "global", "local", or "drift")
        root_path (str): Path to the root directory
        timeout (int, optional): Timeout in seconds for the command
        community_level (int): The community level in the Leiden community hierarchy (default: 2)
        dynamic_community_selection (bool): Whether to use global search with dynamic community selection (default: False)
    
    Returns:
        str: The output from GraphRAG
        
    Raises:
        subprocess.CalledProcessError: If the command fails
        subprocess.TimeoutExpired: If the command times out
        ValueError: If community_level is negative
    """
    # Validate community level
    if community_level < 0:
        raise ValueError("Community level must be non-negative")
    
    # Construct the base command
    command = [
        'graphrag', 'query',
        '--root', root_path,
        '--method', method,
        '--query', query,
        '--community-level', str(community_level)
    ]
    
    # Add dynamic community selection flag if enabled
    if dynamic_community_selection:
        command.append('--dynamic-community-selection')
    
    try:
        # Execute the command and capture output
        result = subprocess.run(
            command,
            capture_output=True,
            text=True,
            timeout=timeout
        )
        
        # Check if the command was successful
        result.check_returncode()
        
        return result.stdout.strip()
        
    except subprocess.CalledProcessError as e:
        error_message = f"Command failed with exit code {e.returncode}\nError: {e.stderr}"
        raise subprocess.CalledProcessError(
            e.returncode,
            e.cmd,
            output=e.output,
            stderr=error_message
        )

### Local Search

<img src="./media/local_search.png" width=900>

The GraphRAG approach to local search is the most similar to regular semantic RAG search. It combines structured data from the knowledge graph with unstructured data from the input documents to augment the LLM context with relevant entity information. In essence, we are going to first search for relevant entities to the query using semantic search. These become the entry points on our graph that we can now traverse. Starting at these points, we look at connected chunks of text, community reports, other entities, and relationships between them. All of the data retrieved is filtered and ranked to fit into a pre-defined context window.

In [None]:
result = query_graphrag(
    query="How does a company choose between RAG, fine-tuning, and different PEFT approaches?",
    method="local"
)
print("Query result:")
print(result)

Choosing between Retrieval-Augmented Generation (RAG), fine-tuning, and various Parameter-Efficient Fine-Tuning (PEFT) approaches involves several considerations that align with the specific needs and constraints of a company. Each method has its strengths and weaknesses, making the decision context-dependent.

### Understanding the Options

**Retrieval-Augmented Generation (RAG)** is a method that enhances the capabilities of language models by integrating real-time data retrieval into the generation process. This approach is particularly beneficial for applications requiring up-to-date information, as it allows models to generate contextually relevant responses based on current data. RAG is advantageous when the goal is to provide accurate and timely responses without the need for extensive model retraining [Data: Sources (2)].

**Fine-tuning**, on the other hand, involves adapting a pre-trained model to specific tasks by training it further on a smaller, task-specific dataset. This method is essential for improving model performance in specialized applications, such as Automatic Speech Recognition (ASR) or natural language processing (NLP) tasks. Fine-tuning is particularly effective when there is ample domain-specific data available, allowing the model to learn the nuances of the target domain [Data: Reports (6), Entities (32)].

**Parameter-Efficient Fine-Tuning (PEFT)** techniques, such as Low-Rank Adaptation (LoRA) and Quantized LoRA (QLoRA), focus on optimizing the fine-tuning process by reducing the number of parameters that need to be adjusted. These methods are designed to enhance efficiency, making them suitable for scenarios where computational resources are limited. PEFT approaches are particularly valuable when organizations need to fine-tune models without incurring high costs or requiring extensive computational power [Data: Reports (5), Entities (49)].

### Key Considerations for Decision-Making

1. **Data Availability**: If a company has access to a large volume of labeled data specific to its domain, fine-tuning may be the best option. However, if data is scarce or constantly changing, RAG could provide a more flexible solution by leveraging real-time data retrieval.

2. **Resource Constraints**: Companies with limited computational resources may benefit from PEFT techniques, which allow for effective model adaptation without the need for extensive retraining. Methods like LoRA and QLoRA are designed to minimize memory usage and computational load, making them ideal for environments with hardware limitations [Data: Reports (5), Entities (180)].

3. **Application Requirements**: The choice may also depend on the specific application requirements. For instance, if the goal is to maintain up-to-date responses in a dynamic environment, RAG would be preferable. Conversely, if the focus is on achieving high accuracy in a specific task, fine-tuning or PEFT methods may be more appropriate.

4. **Performance vs. Efficiency**: Companies must weigh the trade-offs between performance and efficiency. While fine-tuning can lead to superior performance in specialized tasks, PEFT methods offer a more efficient approach that can still achieve competitive results without the overhead of full model retraining [Data: Reports (6), Entities (32)].

5. **Long-term Maintenance**: Consideration of how the chosen method will affect long-term maintenance and scalability is crucial. RAG systems can provide ongoing adaptability to new data, while fine-tuned models may require periodic retraining to maintain performance as data evolves.

### Conclusion

In summary, the decision between RAG, fine-tuning, and PEFT approaches should be guided by the specific needs of the organization, including data availability, resource constraints, application requirements, and long-term maintenance considerations. By carefully evaluating these factors, companies can select the most suitable method to enhance their AI capabilities effectively.

### Global Search

<img src="./media/global_search.png" width=1000>

Through the semantic clustering of communities during the indexxing process outlined above we created community reports as summaries of high level themes across these groupings. Having this community summary data at various levels allows us to do something that traditional RAG performs poorly at, answering queries about broad themes and ideas across our unstructured data.

To capture as much broad information as possible in an efficient manner, GraphRAG implements a [map reduce](https://en.wikipedia.org/wiki/MapReduce) approach. Given a query, relevant community node reports at a specific hierarchical level are retrieved. These are shuffled and chunked, where each chunk is used to generate a list of points that each have their own "importance score". These intermediate points are ranked and filtered, attempting to maintain the most important points. These become the aggregate intermediary response, which is passed to the LLM as the context for the final response.

In [None]:
result = query_graphrag(
    query="How does a company choose between RAG, fine-tuning, and different PEFT approaches?",
    method="global"
)
print("Query result:")
print(result)

# Choosing Between RAG, Fine-Tuning, and PEFT Approaches

When a company is faced with the decision of selecting between Retrieval-Augmented Generation (RAG), fine-tuning, and various Parameter-Efficient Fine-Tuning (PEFT) approaches, several critical factors must be considered. Each method has its unique advantages and trade-offs, which can significantly impact the performance and efficiency of AI applications.

## Understanding the Approaches

### Retrieval-Augmented Generation (RAG)

RAG is particularly advantageous for applications that require real-time data integration and contextually relevant responses. It enhances language models by incorporating external data, which can significantly improve response accuracy and relevance. This method is ideal for scenarios where up-to-date information is critical, such as customer support or content generation [Data: Reports (7, 55, +more)].

### Fine-Tuning

Fine-tuning is essential for adapting pre-trained models to specific tasks, allowing for the incorporation of domain-specific knowledge. This approach is most effective when a company has a well-defined task and sufficient labeled data to train the model effectively. Fine-tuning typically requires more extensive computational resources and time, as it involves retraining model weights on a specific dataset [Data: Reports (36)].

### Parameter-Efficient Fine-Tuning (PEFT)

PEFT approaches, such as Low-Rank Adaptation (LoRA) and Odds-Ratio Preference Optimization (ORPO), provide a way to align model outputs with desired responses without extensive retraining. These methods are particularly beneficial in environments with limited computational resources, allowing for effective deployment of models without incurring high costs. PEFT techniques focus on optimizing model performance while using fewer parameters, making them suitable for scenarios where rapid deployment is necessary [Data: Reports (10, 30)].

## Key Considerations for Decision-Making

### Use Case and Requirements

The specific use case and requirements of the application shall be the primary drivers in the decision-making process. Companies must evaluate whether they need real-time data integration (favoring RAG), domain-specific adaptation (favoring fine-tuning), or efficient model adaptation with limited resources (favoring PEFT) [Data: Reports (5, 21, 24, 32, 70)].

### Data Availability

The nature and availability of data are crucial factors. Fine-tuning is most effective when there is a substantial amount of labeled data available for training. In contrast, RAG can leverage existing knowledge bases, which may reduce the need for extensive retraining but requires a robust retrieval mechanism [Data: Reports (5, 21, 24, 32, 70)].

### Computational Resources

Companies must assess their computational resources and operational constraints. Fine-tuning large models typically requires significant computational power, which may not be feasible for all organizations. PEFT methods, on the other hand, allow for effective model adaptation without extensive retraining, making them more accessible for organizations with limited resources [Data: Reports (6, 10)].

### Performance vs. Resource Efficiency

The balance between performance needs and resource efficiency is another critical consideration. Fine-tuning can lead to high-performing models tailored to specific tasks, but it may also result in overfitting if not managed properly. RAG can provide up-to-date responses and reduce inaccuracies, but it requires a robust data retrieval system to function effectively. PEFT approaches aim to balance performance and efficiency, making them attractive for organizations looking to optimize their AI capabilities without incurring high costs [Data: Reports (10, 52)].

### Scalability and Maintainability

Scalability and maintainability of the chosen approach shall also be evaluated. RAG systems can be more complex to maintain due to the need for an effective retrieval system, while fine-tuned models may require ongoing updates as new data becomes available. PEFT approaches can offer a more modular solution, allowing for easier updates and adaptations as new tasks arise [Data: Reports (5, 21, 24, 32, 70)].

### Technical Expertise

The organization's technical expertise and existing infrastructure may influence the choice of approach. Companies with strong data science teams may prefer fine-tuning, while those looking for quick deployment with less technical overhead might opt for RAG or PEFT methods [Data: Reports (12, 36, 30)].

## Conclusion

Ultimately, the decision between RAG, fine-tuning, and PEFT approaches shall be guided by a thorough analysis of the company's goals, data availability, and resource constraints. Companies may benefit from testing different methods to assess their effectiveness in meeting specific performance metrics and user satisfaction. By carefully considering these factors, organizations can select the most suitable approach for their AI applications, ensuring optimal performance and resource utilization [Data: Reports (12, 36, 30)].

### DRIFT Search

<img src="./media/drift_search.png" width=1000>

[Dynamic Reasoning and Inference with Flexible Traversal](https://www.microsoft.com/en-us/research/blog/introducing-drift-search-combining-global-and-local-search-methods-to-improve-quality-and-efficiency/), or DRIFT, is a novel GraphRAG concept introduced by Microsoft as an approach to local search queries that include community information in the search process.

The user's query is initially processed through [Hypothetical Document Embedding (HyDE)](https://arxiv.org/pdf/2212.10496), which creates a hypothetical document similar to those found in the graph already, but using the user's topic query. This document is embedded and used for semantic retrieval of the top-k relevant community reports. From these matches, we generate an initial answer along with several follow-up questions as a lightweight version of global search. They refer to this as the primer.

Once this primer phase is complete, we execute local searches for each follow-up question generated. Each local search produces both intermediate answers and new follow-up questions, creating a refinement loop. This loop runs for two iterations (noted future research planned to develop reward functions for smarter termination). An important note that makes these local searches unique is that they are informed by both community-level knowledge and detailed entity/relationship data. This allows the DRIFT process to find relevant information even when the initial query diverges from the indexing persona, and it can adapt its approach based on emerging information during the search.

The final output is structured as a hierarchy of questions and answers, ranked by their relevance to the original query. Map reduce is used again with an equal weighting on all intermediate answers, then passed to the language model for a final response. DRIFT cleverly combines global and local search with guided exploration to provide both broad context and specific details in responses.

In [None]:
result = query_graphrag(
    query="How does a company choose between RAG, fine-tuning, and different PEFT approaches?",
    method="drift"
)
print("Query result:")
print(result)

# Choosing Between RAG, Fine-Tuning, and PEFT Approaches                                                                                                                                                                                                           

When a company is faced with the decision of selecting between Retrieval-Augmented Generation (RAG), fine-tuning, and various Parameter-Efficient Fine-Tuning (PEFT) approaches, several critical factors come into play. Each method has its unique strengths and applications, making the choice dependent on specific project requirements, resource availability, and desired outcomes.

## 1. Understanding the Methods

### Fine-Tuning
Fine-tuning involves taking a pre-trained model and training it further on a smaller, task-specific dataset. This process allows the model to adapt its parameters to better suit specific tasks, leading to improved performance in areas such as text classification, sentiment analysis, and question-answering. Companies may opt for fine-tuning when they have access to substantial computational resources and a well-defined dataset that aligns with their objectives [Data: Sources (1, 2, 3)].

### Retrieval-Augmented Generation (RAG)
RAG enhances language models by integrating external data into the response generation process. This method is particularly effective for tasks requiring up-to-date information or when the dataset is too large to fine-tune effectively. Companies may choose RAG when they need to ensure that their models can access and utilize external knowledge dynamically, which is crucial for applications like chatbots or question-answering systems that require real-time data [Data: Sources (1, 2, 3)].

### Parameter-Efficient Fine-Tuning (PEFT)
PEFT techniques, such as Low-Rank Adaptation (LoRA) and Quantised LoRA (QLoRA), focus on optimizing the fine-tuning process by adjusting fewer parameters. This method is advantageous for organizations with limited computational resources, as it reduces the memory and processing power required for training. Companies may prefer PEFT when they aim to achieve efficient model adaptation while minimizing costs [Data: Sources (1, 2, 3)].

## 2. Key Considerations for Decision-Making

### Task Requirements
The specific requirements of the task at hand play a significant role in determining the fine-tuning approach. For instance, tasks that involve nuanced language understanding may necessitate more extensive fine-tuning, while those requiring real-time data integration may benefit more from RAG [Data: Sources (1, 2)].

### Data Availability
The amount and quality of available data are crucial. Fine-tuning typically requires a smaller, task-specific dataset, while RAG can leverage existing data without extensive retraining. If data is scarce, PEFT methods can be advantageous, as they allow for effective adaptation with fewer parameters being updated [Data: Sources (1, 2)].

### Computational Resources
The computational resources available for fine-tuning are a significant factor. Full fine-tuning of large models can be resource-intensive, requiring substantial GPU or TPU capabilities. In scenarios where resources are limited, PEFT methods can provide a more efficient alternative by updating only a subset of model parameters [Data: Sources (1, 2)].

### Performance Outcomes
Ultimately, the choice between RAG, fine-tuning, and PEFT approaches depends on the specific needs of the organization, including the desired accuracy, the nature of the tasks, and the importance of real-time data integration. Companies must weigh these factors carefully to select the most suitable approach for their AI applications, ensuring that they achieve the best possible outcomes in terms of performance and user satisfaction [Data: Sources (1, 2)].

## Conclusion

In summary, the decision-making process for choosing between RAG, fine-tuning, and PEFT approaches involves a careful evaluation of task requirements, data availability, computational resources, and desired performance outcomes. By understanding the strengths and limitations of each method, organizations can make informed choices that align with their strategic goals in AI development.

---

## Comparing to Regular Vector Database Retrieval

<img src="./media/basic_retrieval.png" width=600>
 
To give some comparison, let's look back at traditional chunking, embedding, and similarity retrieval RAG

**Instantiate our Database**

For this we'll be using [ChromaDB](https://www.trychroma.com) with the same chunks as were loaded into our graph.

In [31]:
!pip install chromadb -q

In [32]:
import chromadb

chroma_client = chromadb.PersistentClient(path="./notebook/chromadb")
paper_collection = chroma_client.get_or_create_collection(name="paper_collection")

**Embed Chunks Into Collection**

In [33]:
i = 0
for text in texts:
    paper_collection.add(
        documents=[text],
        ids=f"chunk_{i}"
    )
    i += 1

/home/ts75080/.cache/chroma/onnx_models/all-MiniLM-L6-v2/onnx.tar.gz: 100%|██████████| 79.3M/79.3M [00:06<00:00, 12.5MiB/s]


**Retrieval Function**

In [34]:
def chroma_retrieval(query, num_results=5):
    results = paper_collection.query(
        query_texts=[query],
        n_results=num_results
    )
    return results

**RAG Prompt & Chain**

In [35]:
rag_prompt_template = """
Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.

If you don't know the answer, just say so. Do not make anything up.

Do not include information where the supporting evidence for it is not provided.

Context: {retrieved_docs}

User Question: {query}

"""

rag_prompt = ChatPromptTemplate.from_template(rag_prompt_template)

rag_chain = rag_prompt | llm | StrOutputParser()

In [36]:
chroma_retrieval(query="finetuning vs RAG vs PEFT")

{'ids': [['chunk_3', 'chunk_4', 'chunk_2', 'chunk_15', 'chunk_23']],
 'embeddings': None,
 'documents': [["'s decision-making process. In that case, RAG systems offer insight that is typically not available in models that are solely fine-tuned. Figure1.5 illustrates the visual representation alongside example use cases.\n\n<!-- missing-text -->\n\n1.10 Objectives of the Report\n\n1.10.1 Goals and Scope\n\nThe primary goal of this report is to conduct a comprehensive analysis of fine-tuning techniques for LLMs. This involves exploring theoretical foundations, practical implementation strategies, and challenges. The report examines various fine-tuning methodologies, their applications, and recent advancements.\n\n1.10.2 Key Questions and Issues Addressed\n\nThis report addresses critical questions surrounding fine-tuning LLMs, starting with foundational insights into LLMs, their evolution, and significance in NLP. It defines fine-tuning, distinguishes it from pre-training, and emphasises

**RAG Function**

In [37]:
def chroma_rag(query):
    retrieved_docs = chroma_retrieval(query)["documents"][0]
    response = rag_chain.invoke({"retrieved_docs": retrieved_docs, "query": query})
    return response

**RAG Response**

In [38]:
response = chroma_rag("How does a company choose between RAG, fine-tuning, and different PEFT approaches?")
print(response)

When a company is deciding between Retrieval-Augmented Generation (RAG), fine-tuning, and various Parameter-Efficient Fine-Tuning (PEFT) approaches, several factors come into play:

1. **Data Availability**: If a company has ample domain-specific, labeled training data, fine-tuning may be more suitable as it allows for tailored model behavior. Conversely, if such data is scarce, RAG systems can provide a robust alternative by leveraging external data sources without extensive model adjustments.

2. **Task Requirements**: RAG is ideal for applications needing real-time access to external data, enhancing the model's responses with current information. Fine-tuning is better for tasks requiring specific behavioral adjustments or writing styles.

3. **Performance and Accuracy**: RAG systems tend to perform better in suppressing hallucinations and ensuring accuracy, as they ground outputs in relevant knowledge. Fine-tuning can lead to a more customized model but may risk overfitting if not m

---
## Discussion

**Traditional/Naive RAG:**

Benefits:
- Simpler implementation and deployment
- Works well for straightforward information retrieval tasks
- Good at handling unstructured text data
- Lower computational overhead

Drawbacks:
- Loses structural information when chunking documents
- Can break up related content during text segmentation
- Limited ability to capture relationships between different pieces of information
- May struggle with complex reasoning tasks requiring connecting multiple facts
- Potential for incomplete or fragmented answers due to chunking boundaries

**GraphRAG:**

Benefits:
- Preserves structural relationships and hierarchies in the knowledge
- Better at capturing connections between related information
- Can provide more complete and contextual answers
- Improved retrieval accuracy by leveraging graph structure
- Better supports complex reasoning across multiple facts
- Can maintain document coherence better than chunk-based approaches
- More interpretable due to explicit knowledge representation

Drawbacks:
- More complex to implement and maintain
- Requires additional processing to construct and update knowledge graphs
- Higher computational overhead for graph operations
- May require domain expertise to define graph schema/structure
- More challenging to scale to very large datasets
- Additional storage requirements for graph structure

**Key Differentiators:**
1. Knowledge Representation: Traditional RAG treats everything as flat text chunks, while GraphRAG maintains structured relationships in a graph format

2. Context Preservation: GraphRAG better preserves context and relationships between different pieces of information compared to the chunking approach of traditional RAG

3. Reasoning Capability: GraphRAG enables better multi-hop reasoning and connection of related facts through graph traversal, while traditional RAG is more limited to direct retrieval

4. Answer Quality: GraphRAG tends to produce more complete and coherent answers since it can access related information through graph connections rather than being limited by chunk boundaries

The choice between traditional RAG and GraphRAG often depends on the specific use case, with GraphRAG being particularly valuable when maintaining relationships between information is important or when complex reasoning is required. An important note as well, GraphRAG approaches still rely on regular embedding and retrieval methods themselves. They compliment eahcother!