# Knowledge Graph RAG

<img src="./media/graph_start.png" width=600>

*[Improving Knowledge Graph Completion with Generative LM and neighbors](https://deeppavlov.ai/research/tpost/bn15u1y4v1-improving-knowledge-graph-completion-wit)*

In the evolving landscape of AI and information retrieval, knowledge graphs have emerged as a powerful way to represent complex, interconnected information. A knowledge graph is a knowledge base that uses a graph-structured data model or topology to represent and operate on data. Knowledge graphs are often used to store interlinked descriptions of entities – objects, events, situations or abstract concepts – while also encoding the free-form semantics or relationships underlying these entities. [Source: Wikipedia](https://en.wikipedia.org/wiki/Knowledge_graph)

What makes knowledge graphs particularly powerful is their ability to mirror human cognition in data. They more explicitly map the relationships between objects, concepts, or ideas together through both their semantic and relational connections. This approach closely parallels how our brains naturally understand and internalize information – not as isolated facts, but as a web of interconnected concepts and relationships.

<img src="./media/coffee_graph_ex.png" width=400>

Looking at a concept like "coffee," we don't just know it's a beverage; we automatically connect it to related concepts like beans, brewing methods, caffeine, morning routines, and social interactions. Knowledge graphs capture these natural associations in a structured way.

Traditional RAG systems, while effective at semantic similarity-based retrieval, often struggle to capture broader conceptual relationships across text chunks. Knowledge Graph RAG addresses this limitation by introducing a structured, hierarchical approach to information organization and retrieval. By representing data in a graph format, these systems can traverse relationships between concepts, enabling more sophisticated query understanding and response generation. This approach allows for targeted querying along specific relationship paths, handles complex multi-hop questions, and provides clearer reasoning through explicit connection paths. The result is a more nuanced and interpretable system that combines the structured reasoning of knowledge graphs with the natural language capabilities of large language models.

While [knowledge graphs are not a new concept](https://blog.google/products/search/introducing-knowledge-graph-things-not/), their creation has traditionally been a resource-intensive process. Early knowledge graphs were built either through manual curation by domain experts or by converting existing structured data from relational databases. This limited both their scale and adaptability to new domains.

<img src="./media/table_comp.png" width=600>

*[What is a Knowledge Graph (KG)?](https://zilliz.com/learn/what-is-knowledge-graph)*

The introduction of LLMs has transformed this landscape. LLMs' capabilities in NLP, reasoning, and relationship extraction now enable automated construction of knowledge graphs from unstructured text. These models can identify entities, infer relationships, and structure information in ways that previously required extensive manual labor. As a plus, this allows knowledge graphs to be dynamically updated and expanded as new information becomes available, making them more practical and scalable for real-world applications.

To see this in action ourselves, and compare it to traditional vector similarity techniques, we'll take a look at Microsoft's Open Source [GraphRAG](https://microsoft.github.io/graphrag/) and how it works behind the scenes.

---
## 3 Main Components of Knowledge Graphs

**Entity**

<img src="./media/entities.png" width=500>

An Entity is a distinct object, person, place, event, or concept that has been extracted from a chunk of text through LLM analysis. Entities form the nodes of the knowledge graph. During the creation of the knowledge graph, when duplicate entities are found they are merged while preserving their various descriptions, creating a comprehensive representation of each unique entity.

**Relationship**

<img src="./media/relationship.png" width=400>

A Relationship defines a connection between two entities in the knowledge graph. These connections are extracted directly from text units through LLM analysis, alongside entities. Each relationship includes a source entity, target entity, and descriptive information about their connection. When duplicate relationships are found between the same entities, they are merged by combining their descriptions to create a more complete understanding of the connection.

**Community**

<img src="./media/communities.png" width=400>

A Community is a cluster of related entities and relationships identified through hierarchical community detection, generally using the [Leiden Algorithm](https://en.wikipedia.org/wiki/Leiden_algorithm). Communities create a structured way to understand different levels of granularity within the knowledge graph, from broad overviews at the top level to detailed local clusters at lower levels. This hierarchical structure helps in organizing and navigating complex knowledge graphs.

---
## GraphRAG Creation Data Flow

<img src=./media/graph_building.png width=1000>

Indexxing in GraphRAG is an extensive process, where we load the document, split it into chunks, create sub graphs at a chunk level, combine these subgraphs into our final graph, algorithmically identify communities, then document the communities main features.

### **Loading and Splitting Our Text**

For our example, we'll be using [The Ultimate Guide to Fine-Tuning LLMs from Basics to Breakthroughs: An Exhaustive Review of Technologies, Research, Best Practices, Applied Research Challenges and Opportunities](https://arxiv.org/pdf/2408.13296).

This will be loaded as a text file (remove index, glossary, and references) and split into 1200 token, 100 token overlap chunks.

In [1]:
from llama_index.core import SimpleDirectoryReader

documents = SimpleDirectoryReader("./kgdata/").load_data()

In [5]:
import pandas as pd
import tiktoken

# Initialize the cl100k_base tokenizer (used by GPT-4 models)
encoding = tiktoken.get_encoding("cl100k_base")

# Function to count tokens in a text
def count_tokens(text):
    return len(encoding.encode(text))

# Create a dataframe with document information and token counts
doc_info = []
total_tokens = 0

for i, doc in enumerate(documents):
    filename = doc.metadata.get('file_name', f'Document {i}')
    token_count = count_tokens(doc.text)
    total_tokens += token_count
    doc_info.append({
        'Document': filename,
        'Tokens': token_count,
        'Characters': len(doc.text)
    })

# Create DataFrame and display
doc_df = pd.DataFrame(doc_info)
display(doc_df)

print(f"Total tokens across all documents: {total_tokens}")
print(f"Average tokens per document: {total_tokens / len(documents):.1f}")

Unnamed: 0,Document,Tokens,Characters
0,01.md,2656,8569
1,02.md,2139,7333
2,03.md,1944,6596
3,04.md,2050,6934
4,05.md,3078,10185
5,06.md,3815,12437
6,07.md,3717,11991
7,08.md,2038,6993
8,09.md,2548,8475
9,10.md,3056,9977


Total tokens across all documents: 105900
Average tokens per document: 2862.2


In [17]:
from langchain_text_splitters import TokenTextSplitter
import time

# Define a simple progress tracking function that doesn't require tqdm
def simple_progress(iterable, desc="Processing"):
    """A simple progress tracker for notebooks that doesn't require ipywidgets."""
    total = len(iterable)
    start_time = time.time()
    print(f"{desc}: 0/{total} (0%)")
    
    for i, item in enumerate(iterable):
        yield item
        # Update progress every 5% or at least 1 item
        if (i+1) % max(1, total//20) == 0 or i+1 == total:
            elapsed = time.time() - start_time
            percent = (i+1)/total*100
            items_per_sec = (i+1)/elapsed if elapsed > 0 else 0
            print(f"{desc}: {i+1}/{total} ({percent:.1f}%) - {items_per_sec:.1f} items/s")

# Define chunking parameters
chunk_size = 1200
chunk_overlap = 100

# Initialize text splitter with our parameters
text_splitter = TokenTextSplitter(chunk_size=chunk_size, chunk_overlap=chunk_overlap)

# Dictionary to store chunks for each document
document_chunks = {}

# Process each document
print(f"Splitting documents into chunks (size={chunk_size}, overlap={chunk_overlap})")
total_chunks = 0

for doc in simple_progress(documents, desc="Processing documents"):
    filename = doc.metadata.get('file_name')
    # Split the document text into chunks
    chunks = text_splitter.split_text(doc.text)
    # Store the chunks
    document_chunks[filename] = chunks
    total_chunks += len(chunks)

print(f"Created {total_chunks} total chunks across {len(documents)} documents")

# Display summary of chunks per document
chunk_summary = []
for filename, chunks in document_chunks.items():
    chunk_summary.append({
        'Document': filename,
        'Chunks': len(chunks),
        'Avg. Chunk Length': sum(len(chunk) for chunk in chunks) / len(chunks) if chunks else 0
    })

# Create DataFrame and display
chunk_df = pd.DataFrame(chunk_summary)
display(chunk_df.sort_values(by='Chunks', ascending=False).head(10))

Splitting documents into chunks (size=1200, overlap=100)
Processing documents: 0/37 (0%)
Processing documents: 1/37 (2.7%) - 136.5 items/s
Processing documents: 2/37 (5.4%) - 168.4 items/s
Processing documents: 3/37 (8.1%) - 197.8 items/s
Processing documents: 4/37 (10.8%) - 215.0 items/s
Processing documents: 5/37 (13.5%) - 211.2 items/s
Processing documents: 6/37 (16.2%) - 223.9 items/s
Processing documents: 7/37 (18.9%) - 225.0 items/s
Processing documents: 8/37 (21.6%) - 257.1 items/s
Processing documents: 9/37 (24.3%) - 249.0 items/s
Processing documents: 10/37 (27.0%) - 276.7 items/s
Processing documents: 11/37 (29.7%) - 261.1 items/s
Processing documents: 12/37 (32.4%) - 284.8 items/s
Processing documents: 13/37 (35.1%) - 308.6 items/s
Processing documents: 14/37 (37.8%) - 332.3 items/s
Processing documents: 15/37 (40.5%) - 356.1 items/s
Processing documents: 16/37 (43.2%) - 379.8 items/s
Processing documents: 17/37 (45.9%) - 273.4 items/s
Processing documents: 18/37 (48.6%) - 2

Unnamed: 0,Document,Chunks,Avg. Chunk Length
10,11.md,6,3013.833333
22,27.md,5,3036.2
28,33.md,5,2749.8
30,35.md,4,2854.75
21,26.md,4,3045.0
19,24.md,4,3408.0
26,31.md,4,2682.0
27,32.md,4,2696.5
16,20.md,4,3177.75
15,19.md,4,2777.25


In [None]:
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI
from src import get_azure_openai_chat_model, get_azure_openai_mini_model

llm1 = get_azure_openai_chat_model()
llm2 = get_azure_openai_mini_model()

# llm = ChatOpenAI(temperature=0.0, model="gpt-4o")

prompt_template = """
-Goal-
Given a text document containing medical protocols or guidelines, identify all entities of the specified medical types and all relationships among the identified entities to build a knowledge graph.

-Steps-
1.  Identify all entities within the text. For each identified entity, extract the following information:
    *   `entity_name`: Name of the entity, capitalized (e.g., CPR, UNCONSCIOUSNESS, NEWBORN).
    *   `entity_type`: One of the relevant medical types provided below.
    *   `entity_description`: Comprehensive description of the entity's attributes, purpose, or actions as described in the text.
    Format each entity as ("entity"{{tuple_delimiter}}<entity_name>{{tuple_delimiter}}<entity_type>{{tuple_delimiter}}<entity_description>)

2.  From the entities identified in step 1, identify all pairs of (source_entity, target_entity) that are *clearly and directly related* within the context of the provided text.
    For each pair of related entities, extract the following information:
    *   `source_entity`: name of the source entity, as identified in step 1.
    *   `target_entity`: name of the target entity, as identified in step 1.
    *   `relationship_description`: explanation in English describing the nature of the relationship between the source and target entity based *only* on the text (e.g., "is a symptom of", "is used to treat", "is performed on", "is a type of", "is indicated for", "uses", "should be checked in case of").
    *   `relationship_strength`: an integer score between 1 (weakly related) to 10 (very strongly and explicitly related).
    Format each relationship as ("relationship"{{tuple_delimiter}}<source_entity>{{tuple_delimiter}}<target_entity>{{tuple_delimiter}}<relationship_description>{{tuple_delimiter}}<relationship_strength>)

3.  Return the output as a single list containing all identified entities and relationships. Use **{{record_delimiter}}** as the delimiter between each entity or relationship record. The primary language of the provided text is mixed English and Norwegian.

4.  Translate Norwegian descriptions into English for the `entity_description` and `relationship_description` fields *only*. Keep entity names and types consistent (preferably English where obvious equivalents exist, otherwise use the capitalized term from the text).

5.  When finished, output {{completion_delimiter}}.

-Relevant Medical Entity Types-
[Medical Condition, Symptom, Patient Group, Medical Procedure, Medical Device/Tool, Medication/Substance, Anatomical Location, Guideline Section, Organization/Role, Medical Concept]

-Examples-
######################

Example 1 (Based on 01.md/02.md):

entity_types: [Medical Condition, Symptom, Patient Group, Medical Procedure, Medical Device/Tool, Medication/Substance, Anatomical Location, Guideline Section, Organization/Role, Medical Concept]
text:
# 01 Unconscious adult – not breathing normally
## CRITERIA
- Critical | Unconscious adult, not breathing normally
## SITUATIONAL GUIDANCE & IMPORTANT TO ASCERTAIN
- Help is on the way as I speak to you.
- You must start CPR (reviving the person). I will tell you what to do.
- Don’t hang up, put the phone on speaker if you can.
- If there is defibrillator at hand, get someone else to fetch it. Check Hjertestarterregisteret (the caller must not fetch a defibrillator / AED if alone)
## EMERGENCY RESPONSE
### SCENARIO
- BCPR (CARDIO PULMONARY RESUSCITATION)
### IF YES
- Push down at this rate 30 times.
- Now give rescue breaths.
- Tilt the head back with one hand on the forehead.
- Lift the chin up with the other hand.
- Pinch the nose and give 2 gentle rescue breaths.
- Continue with 30 pushes and 2 rescue breaths until medics take over or the person wakes up.
- Lay the person on the floor, on his / her back.
- Kneel beside the person’s chest.
- Place your hands in the middle of his / her chest... Push down hard...
------------------------
output:
("entity"{{tuple_delimiter}}UNCONSCIOUS ADULT{{tuple_delimiter}}Patient Group{{tuple_delimiter}}An adult patient who is unconscious and not breathing normally, requiring critical intervention.){{record_delimiter}}
("entity"{{tuple_delimiter}}NOT BREATHING NORMALLY{{tuple_delimiter}}Symptom{{tuple_delimiter}}A critical symptom indicating lack of normal respiration, often associated with unconsciousness.){{record_delimiter}}
("entity"{{tuple_delimiter}}CPR{{tuple_delimiter}}Medical Procedure{{tuple_delimiter}}Cardiopulmonary Resuscitation, a life-saving procedure involving chest compressions and rescue breaths, instructed by the call handler.){{record_delimiter}}
("entity"{{tuple_delimiter}}CHEST COMPRESSIONS{{tuple_delimiter}}Medical Procedure{{tuple_delimiter}}A component of CPR involving pushing down hard on the center of the chest 30 times.){{record_delimiter}}
("entity"{{tuple_delimiter}}RESCUE BREATHS{{tuple_delimiter}}Medical Procedure{{tuple_delimiter}}A component of CPR involving tilting the head, lifting the chin, pinching the nose, and giving 2 gentle breaths into the mouth.){{record_delimiter}}
("entity"{{tuple_delimiter}}AED{{tuple_delimiter}}Medical Device/Tool{{tuple_delimiter}}Automated External Defibrillator (also referred to as defibrillator or Hjertestarter), a device to be fetched if available and if someone else is present.){{record_delimiter}}
("entity"{{tuple_delimiter}}CALLER{{tuple_delimiter}}Organization/Role{{tuple_delimiter}}The person calling for help who is instructed to perform CPR.){{record_delimiter}}
("entity"{{tuple_delimiter}}MEDICS{{tuple_delimiter}}Organization/Role{{tuple_delimiter}}Emergency medical personnel who are on their way and will take over CPR upon arrival.){{record_delimiter}}
("entity"{{tuple_delimiter}}CRITERIA{{tuple_delimiter}}Guideline Section{{tuple_delimiter}}Section defining the conditions under which this protocol applies, such as an unconscious adult not breathing normally.){{record_delimiter}}
("entity"{{tuple_delimiter}}CHEST{{tuple_delimiter}}Anatomical Location{{tuple_delimiter}}The location on the body (middle of the chest) where chest compressions are applied during CPR.){{record_delimiter}}
("entity"{{tuple_delimiter}}HEAD{{tuple_delimiter}}Anatomical Location{{tuple_delimiter}}Body part manipulated during rescue breaths (tilt the head back).){{record_delimiter}}
("entity"{{tuple_delimiter}}CHIN{{tuple_delimiter}}Anatomical Location{{tuple_delimiter}}Body part manipulated during rescue breaths (lift the chin up).){{record_delimiter}}
("relationship"{{tuple_delimiter}}UNCONSCIOUS ADULT{{tuple_delimiter}}NOT BREATHING NORMALLY{{tuple_delimiter}}Is a defining symptom for this patient group according to the criteria.{{tuple_delimiter}}9){{record_delimiter}}
("relationship"{{tuple_delimiter}}CPR{{tuple_delimiter}}UNCONSCIOUS ADULT{{tuple_delimiter}}Is the required procedure for an unconscious adult not breathing normally.{{tuple_delimiter}}10){{record_delimiter}}
("relationship"{{tuple_delimiter}}CPR{{tuple_delimiter}}CHEST COMPRESSIONS{{tuple_delimiter}}Incorporates chest compressions as a key component (30 pushes).{{tuple_delimiter}}9){{record_delimiter}}
("relationship"{{tuple_delimiter}}CPR{{tuple_delimiter}}RESCUE BREATHS{{tuple_delimiter}}Incorporates rescue breaths as a key component (2 breaths).{{tuple_delimiter}}9){{record_delimiter}}
("relationship"{{tuple_delimiter}}CALLER{{tuple_delimiter}}CPR{{tuple_delimiter}}Is instructed to perform CPR.{{tuple_delimiter}}8){{record_delimiter}}
("relationship"{{tuple_delimiter}}AED{{tuple_delimiter}}CPR{{tuple_delimiter}}Should be fetched and used during CPR if available and feasible.{{tuple_delimiter}}7){{record_delimiter}}
("relationship"{{tuple_delimiter}}CHEST COMPRESSIONS{{tuple_delimiter}}CHEST{{tuple_delimiter}}Are applied to the middle of the chest.{{tuple_delimiter}}9){{record_delimiter}}
("relationship"{{tuple_delimiter}}RESCUE BREATHS{{tuple_delimiter}}HEAD{{tuple_delimiter}}Involves tilting the head back.{{tuple_delimiter}}8){{record_delimiter}}
("relationship"{{tuple_delimiter}}RESCUE BREATHS{{tuple_delimiter}}CHIN{{tuple_delimiter}}Involves lifting the chin up.{{tuple_delimiter}}8){{record_delimiter}}
("relationship"{{tuple_delimiter}}MEDICS{{tuple_delimiter}}CPR{{tuple_delimiter}}Will take over CPR from the caller upon arrival.{{tuple_delimiter}}8)
{{completion_delimiter}}
#############################

Example 2 (Based on 20.md):

entity_types: [Medical Condition, Symptom, Patient Group, Medical Procedure, Medical Device/Tool, Medication/Substance, Anatomical Location, Guideline Section, Organization/Role, Medical Concept]
text:
# 20 Diabetes
## CRITERIA
- Critical | Drowsy (decreased level of consciousness) - May have a low blood sugar (hypo) | 1.2.3.5.6
## ADVICE
### Advice 5. IF THE PERSON HAS A HYPO
If necessary and the person has a glucagon injection or nasal spray:
– Give one dose (1 mg) Glucagon...
### Advice 7. DROWSY OR DAZED AND UNABLE TO DRINK
– Do not force the person to drink...
– Alternatively, you can put one or two spoonfuls of honey in the mouth. You can also spread honey or granulated sugar on the gums, between the lips and the teeth.
### Advice 8. THE PERSON IS AWAKE ENOUGH TO DRINK
– Give the person several glasses of sugary drink e.g. fizzy drink, cordial, juice or milk.
## INFORMATION
### HYPOGLYKEMI – LAVT BLODSUKKER
Når blodsukkeret synker under 4 mmol/l... Hvis blodsukkeret faller ytterligere under 3 mmol/l, opptrer føling (insulinføling)... Pas. kan hurtig bli sløv, bevisstløs eller få kramper. Behandlingen er rask tilførsel av sukker eller Glukagon®
------------------------
output:
("entity"{{tuple_delimiter}}DIABETES{{tuple_delimiter}}Medical Condition{{tuple_delimiter}}The underlying medical condition being addressed in this protocol.){{record_delimiter}}
("entity"{{tuple_delimiter}}HYPOGLYCEMIA{{tuple_delimiter}}Medical Condition{{tuple_delimiter}}Low blood sugar (føling/insulinføling), defined as blood sugar below 4 mmol/L or 3 mmol/L, potentially causing drowsiness, unconsciousness, or seizures. Referred to as 'hypo'.){{record_delimiter}}
("entity"{{tuple_delimiter}}DROWSINESS{{tuple_delimiter}}Symptom{{tuple_delimiter}}Decreased level of consciousness, listed as a critical criterion possibly indicating hypoglycemia.){{record_delimiter}}
("entity"{{tuple_delimiter}}GLUCAGON{{tuple_delimiter}}Medication/Substance{{tuple_delimiter}}A medication administered via injection (1mg or 0.5mg dose) or nasal spray (3mg) to treat severe hypoglycemia if available.){{record_delimiter}}
("entity"{{tuple_delimiter}}GLUCAGON KIT{{tuple_delimiter}}Medical Device/Tool{{tuple_delimiter}}Refers to the glucagon injection or nasal spray kit the person might have.){{record_delimiter}}
("entity"{{tuple_delimiter}}HONEY{{tuple_delimiter}}Medication/Substance{{tuple_delimiter}}A sugary substance that can be placed in the mouth or on the gums of a drowsy person unable to drink.){{record_delimiter}}
("entity"{{tuple_delimiter}}GRANULATED SUGAR{{tuple_delimiter}}Medication/Substance{{tuple_delimiter}}A sugary substance that can be spread on the gums of a drowsy person unable to drink.){{record_delimiter}}
("entity"{{tuple_delimiter}}SUGARY DRINK{{tuple_delimiter}}Medication/Substance{{tuple_delimiter}}Drinks like fizzy drinks, cordial, juice, or milk given to a person awake enough to drink to raise blood sugar.){{record_delimiter}}
("entity"{{tuple_delimiter}}GUMS{{tuple_delimiter}}Anatomical Location{{tuple_delimiter}}Location in the mouth where honey or granulated sugar can be applied for absorption in a drowsy patient.){{record_delimiter}}
("entity"{{tuple_delimiter}}ADVICE{{tuple_delimiter}}Guideline Section{{tuple_delimiter}}Section providing instructions on how to manage specific situations like hypoglycemia.){{record_delimiter}}
("entity"{{tuple_delimiter}}INFORMATION{{tuple_delimiter}}Guideline Section{{tuple_delimiter}}Section providing background information on conditions like hypoglycemia.){{record_delimiter}}
("relationship"{{tuple_delimiter}}HYPOGLYCEMIA{{tuple_delimiter}}DIABETES{{tuple_delimiter}}Is a potential complication or state related to Diabetes.{{tuple_delimiter}}9){{record_delimiter}}
("relationship"{{tuple_delimiter}}DROWSINESS{{tuple_delimiter}}HYPOGLYCEMIA{{tuple_delimiter}}Is listed as a potential symptom or consequence of Hypoglycemia.{{tuple_delimiter}}8){{record_delimiter}}
("relationship"{{tuple_delimiter}}GLUCAGON{{tuple_delimiter}}HYPOGLYCEMIA{{tuple_delimiter}}Is a treatment for severe Hypoglycemia.{{tuple_delimiter}}10){{record_delimiter}}
("relationship"{{tuple_delimiter}}HONEY{{tuple_delimiter}}HYPOGLYCEMIA{{tuple_delimiter}}Is an alternative treatment for Hypoglycemia in drowsy patients unable to drink.{{tuple_delimiter}}7){{record_delimiter}}
("relationship"{{tuple_delimiter}}GRANULATED SUGAR{{tuple_delimiter}}HYPOGLYCEMIA{{tuple_delimiter}}Is an alternative treatment for Hypoglycemia in drowsy patients unable to drink.{{tuple_delimiter}}7){{record_delimiter}}
("relationship"{{tuple_delimiter}}SUGARY DRINK{{tuple_delimiter}}HYPOGLYCEMIA{{tuple_delimiter}}Is a treatment for Hypoglycemia in patients awake enough to drink.{{tuple_delimiter}}8){{record_delimiter}}
("relationship"{{tuple_delimiter}}GLUCAGON KIT{{tuple_delimiter}}GLUCAGON{{tuple_delimiter}}Is the delivery method for Glucagon medication.{{tuple_delimiter}}9){{record_delimiter}}
("relationship"{{tuple_delimiter}}HONEY{{tuple_delimiter}}GUMS{{tuple_delimiter}}Can be applied to the gums for absorption.{{tuple_delimiter}}7){{record_delimiter}}
("relationship"{{tuple_delimiter}}GRANULATED SUGAR{{tuple_delimiter}}GUMS{{tuple_delimiter}}Can be applied to the gums for absorption.{{tuple_delimiter}}7)
{{completion_delimiter}}
#############################

-Real Data-
######################
entity_types: [Medical Condition, Symptom, Patient Group, Medical Procedure, Medical Device/Tool, Medication/Substance, Anatomical Location, Guideline Section, Organization/Role, Medical Concept]
text: {input_text}
######################
output:
"""

prompt = ChatPromptTemplate.from_template(prompt_template)

chain = prompt | llm1 | StrOutputParser()

**Creating a Response**

In [None]:
response = chain.invoke({"input_text": documents[0].text})

In [76]:
print(response)

```plaintext
("entity"{tuple_delimiter}EVALUATION METRICS{tuple_delimiter}evaluation metrics{tuple_delimiter}Evaluation metrics are used to measure the performance of AI models, including metrics like cross-entropy, perplexity, factuality, and context relevance)
{record_delimiter}
("entity"{tuple_delimiter}HYPERPARAMETERS{tuple_delimiter}hyperparameters{tuple_delimiter}Hyperparameters are key settings in model training, such as learning rate, batch size, and number of training epochs, which are adjusted to optimize model performance)
{record_delimiter}
("entity"{tuple_delimiter}CROSS-ENTROPY{tuple_delimiter}evaluation metrics{tuple_delimiter}Cross-entropy is a key metric for evaluating large language models (LLMs) during training or fine-tuning, quantifying the difference between predicted and actual data distributions)
{record_delimiter}
("entity"{tuple_delimiter}PERPLEXITY{tuple_delimiter}evaluation metrics{tuple_delimiter}Perplexity measures how well a probability distribution or mo

### **Looking at Final Entities and Relationships**

In [77]:
import pandas as pd

entities = pd.read_parquet('./ragtest/output/create_final_entities.parquet')

entities.head()

Unnamed: 0,id,human_readable_id,title,type,description,text_unit_ids
0,e3a7f24b-88b6-4481-b3a7-c35075a9671f,0,GPT-3,ORGANIZATION,GPT-3 is a large language model developed by O...,[ca73c495111f5cadd87e6a7a01aed66647ae6623fdf41...
1,f55cae4e-dd0d-47a2-912b-f7680147dd31,1,GPT-4,ORGANIZATION,GPT-4 is an advanced large language model deve...,[ca73c495111f5cadd87e6a7a01aed66647ae6623fdf41...
2,f3e3e46b-6746-45a7-9a26-1432f14c45e4,2,BERT,ORGANIZATION,"BERT, which stands for Bidirectional Encoder R...",[ca73c495111f5cadd87e6a7a01aed66647ae6623fdf41...
3,0491a417-2e18-41c4-ae1e-3e39bf2eb98f,3,PALM,ORGANIZATION,PaLM is a large language model developed by Go...,[ca73c495111f5cadd87e6a7a01aed66647ae6623fdf41...
4,2b7f14f5-d1d5-49f6-bace-46fd1767f99e,4,LLAMA,ORGANIZATION,LLAMA is a versatile and advanced model known ...,[ca73c495111f5cadd87e6a7a01aed66647ae6623fdf41...


['# 01 Unconscious adult – not breathing normally\n\n## CRITERIA\n- Critical | Unconscious adult, not breathing normally\n\n## SITUATIONAL GUIDANCE & IMPORTANT TO ASCERTAIN\n- Help is on the way as I speak to you.\n- You must start CPR (reviving the person). I will tell you what to do.\n- Don’t hang up, put the phone on speaker if you can.\n- If there is defibrillator at hand, get someone else to fetch it. Check Hjertestarterregisteret (the caller must not fetch a defibrillator / AED if alone)\n- If you suspect a blocked airway, open the mouth to see if you can remove any object.\n- Are there any children / adolescents present?\n- Are they in need of immediate care or support?\n- Er du usikker på om personen puster normalt: Gi veiledning i sikring av fri luftvei og HLR.\n- Tilby ALLE innringere veiledning, selv om de kan HLR fra før.\n- Gi kontinuerlig veiledning og oppmuntring.\n- Dersom innringer tror personen er død, bør HLR-instruksjoner likevel tilbys, såfremt personen ikke har st

In [None]:
import pandas as pd

entities = pd.read_parquet('./ragtest/output/create_final_entities.parquet')

entities.head()

In [78]:
relationships = pd.read_parquet('./ragtest/output/create_final_relationships.parquet')

relationships.head()

Unnamed: 0,id,human_readable_id,source,target,description,weight,combined_degree,text_unit_ids
0,b895553a-f860-4d15-bba2-a42f1464e810,0,GPT-3,GPT-4,"GPT-4 is an advanced version of GPT-3, buildin...",8.0,20,[ca73c495111f5cadd87e6a7a01aed66647ae6623fdf41...
1,1548feb2-5a6a-43e6-ab44-81252056193e,1,GPT-3,CHATGPT,"ChatGPT is based on the GPT architecture, spec...",7.0,14,[ca73c495111f5cadd87e6a7a01aed66647ae6623fdf41...
2,e538721d-c023-4994-b918-1efece80ea7e,2,GPT-3,BERT,Both BERT and GPT-3 are pre-trained language m...,6.0,18,[ca73c495111f5cadd87e6a7a01aed66647ae6623fdf41...
3,063a2941-df53-4b5c-a66a-45e60bdba604,3,GPT-3,REINFORCEMENT LEARNING FROM HUMAN FEEDBACK (RLHF),RLHF is used in training GPT-3 to refine its o...,7.0,13,[ca73c495111f5cadd87e6a7a01aed66647ae6623fdf41...
4,7e0cd0b4-4688-434f-a194-f2b9ce397a94,4,GPT-3,PROMPT ENGINEERING,Prompt engineering is a technique used to guid...,6.0,14,[ca73c495111f5cadd87e6a7a01aed66647ae6623fdf41...


### **Community Detection & Node Embedding**

<img src="./media/leidan.png" width=600>

After we have our basic graph with entities and relationships, we analyze its structure in two ways. Community Detection uses the [Leiden algorithm](https://en.wikipedia.org/wiki/Leiden_algorithm) to find explicit groupings in the graph, creating a hierarchy of related entities. The lower in the hierarchy, the more granular the community. Node Embedding uses [Node2Vec](https://arxiv.org/abs/1607.00653) to create vector representations of each entity, capturing implicit relationships in the graph structure. These complementary approaches let us understand both obvious connections through communities and subtle patterns through embeddings.

Combining all of this with our relationships gives us our final nodes.

In [80]:
nodes = pd.read_parquet('./ragtest/output/create_final_nodes.parquet')

nodes.head(10)

Unnamed: 0,id,human_readable_id,title,community,level,degree,x,y
0,e3a7f24b-88b6-4481-b3a7-c35075a9671f,0,GPT-3,8,0,12,-4.875545,4.017587
1,e3a7f24b-88b6-4481-b3a7-c35075a9671f,0,GPT-3,43,1,12,-4.875545,4.017587
2,f55cae4e-dd0d-47a2-912b-f7680147dd31,1,GPT-4,8,0,8,-4.561064,1.505724
3,f55cae4e-dd0d-47a2-912b-f7680147dd31,1,GPT-4,46,1,8,-4.561064,1.505724
4,f3e3e46b-6746-45a7-9a26-1432f14c45e4,2,BERT,8,0,6,-5.71058,3.546957
5,f3e3e46b-6746-45a7-9a26-1432f14c45e4,2,BERT,44,1,6,-5.71058,3.546957
6,0491a417-2e18-41c4-ae1e-3e39bf2eb98f,3,PALM,8,0,3,-5.309392,1.548029
7,0491a417-2e18-41c4-ae1e-3e39bf2eb98f,3,PALM,46,1,3,-5.309392,1.548029
8,2b7f14f5-d1d5-49f6-bace-46fd1767f99e,4,LLAMA,3,0,4,-6.644573,0.421999
9,2b7f14f5-d1d5-49f6-bace-46fd1767f99e,4,LLAMA,27,1,4,-6.644573,0.421999


At this step the graph is effectively created, however we can introduce a few extra steps that will allow us to do some advanced retrieval.

### Community Report Generation & Summarization

Now that we have clear community grouping, we can aggregate the main concepts across hierarchical node communities with another generation step, and a shorthand summary of that summary. Similar to the nodes, these summaries are also ran through an embedding model and stored in a vector store.

In [81]:
community_reports = pd.read_parquet('./ragtest/output/create_final_community_reports.parquet')

community_reports.head()

Unnamed: 0,id,human_readable_id,community,parent,level,title,summary,full_content,rank,rank_explanation,findings,full_content_json,period,size
0,a85d59a64a054114982b1ce6e1ced591,61,61,32,2,Amazon Bedrock and AI Model Providers,The community is centered around Amazon Bedroc...,# Amazon Bedrock and AI Model Providers\n\nThe...,8.5,The impact severity rating is high due to Amaz...,[{'explanation': 'Amazon Bedrock is a pivotal ...,"{\n ""title"": ""Amazon Bedrock and AI Model P...",2024-12-18,9
1,6aafc6eeddd848bc8ffbfb9177790c26,62,62,32,2,AWS and SageMaker JumpStart,The community is centered around Amazon Web Se...,# AWS and SageMaker JumpStart\n\nThe community...,8.5,The impact severity rating is high due to AWS'...,[{'explanation': 'Amazon Web Services (AWS) is...,"{\n ""title"": ""AWS and SageMaker JumpStart"",...",2024-12-18,2
2,e13e3ed0a0b74fd090319957ae9f3e1e,14,14,0,1,PPO for LLM Alignment and Reinforcement Learni...,The community centers around the study 'PPO fo...,# PPO for LLM Alignment and Reinforcement Lear...,7.5,The impact severity rating is high due to the ...,[{'explanation': 'The study 'PPO for LLM Align...,"{\n ""title"": ""PPO for LLM Alignment and Rei...",2024-12-18,7
3,828baab1461b439ea71203ad8fd0aae5,15,15,0,1,HuggingFace and Advanced NLP Tools,"The community is centered around HuggingFace, ...",# HuggingFace and Advanced NLP Tools\n\nThe co...,8.5,The impact severity rating is high due to Hugg...,[{'explanation': 'HuggingFace is a prominent e...,"{\n ""title"": ""HuggingFace and Advanced NLP ...",2024-12-18,7
4,791da6e7031e45228442b277e7d912c6,16,16,0,1,OpenAI and AI Development Platforms,"The community is centered around OpenAI, a lea...",# OpenAI and AI Development Platforms\n\nThe c...,8.5,The impact severity rating is high due to the ...,[{'explanation': 'OpenAI is a central entity i...,"{\n ""title"": ""OpenAI and AI Development Pla...",2024-12-18,7


In [83]:
print(community_reports["full_content"][0])

# Amazon Bedrock and AI Model Providers

The community is centered around Amazon Bedrock, a service by AWS that facilitates access to foundation models from various AI innovators. Key entities include AI21 Labs, Anthropic, Cohere, Mistral AI, and Stability AI, all of which provide models accessible through Amazon Bedrock. The service integrates with AWS infrastructure, including AWS Lambda and AWS SageMaker, to support scalable AI model deployment.

## Amazon Bedrock as a central service

Amazon Bedrock is a pivotal service within the AWS ecosystem, designed to simplify access to high-performing foundation models for generative AI applications. It integrates seamlessly with other AWS services, such as Amazon S3, AWS Lambda, and AWS SageMaker, to facilitate the fine-tuning and deployment of AI models. This integration underscores its importance in the AI landscape, providing a comprehensive suite of tools for scalable AI model deployment [Data: Entities (206); Relationships (281, 326, 3

In [82]:
print(community_reports["summary"][0])

The community is centered around Amazon Bedrock, a service by AWS that facilitates access to foundation models from various AI innovators. Key entities include AI21 Labs, Anthropic, Cohere, Mistral AI, and Stability AI, all of which provide models accessible through Amazon Bedrock. The service integrates with AWS infrastructure, including AWS Lambda and AWS SageMaker, to support scalable AI model deployment.


### The Final Graph!

<img src="./media/ghraphrag_viz.svg" width=800>

*[Full Size PDF](./ghraphrag_viz.pdf)*

---

## GraphRAG Retrieval

<img src="./media/kg_retrieval.png" width=600>

*[Unifying Large Language Models and Knowledge Graphs: A Roadmap](https://arxiv.org/pdf/2306.08302)*

With our knowledge graph constructed, and hierarchichal communities delineated, we can now perform multiple types of search that can both take advantage of the graph structure, and multiple levels of specificity across our communities. Specifically:

1. **Global Search**: Uses the LLM Generated community reports from a specified level of the graph's community hierarchy as context data to generate response.
2. **Local Search**: Combines structured data from the knowledge graph with unstructured data from the input document(s) to augment the LLM context with relevant entity information.
3. **Drift Search**: Dynamic Reasoning and Inference with Flexible Traversal, an approach to local search queries by including community information in the search process, thus combining global and local search.

**GraphRAG Retrieval Function**

*Note: Wrapping the [GraphRAG CLI tool](https://microsoft.github.io/graphrag/cli/) as a function here instead of using their [library](https://microsoft.github.io/graphrag/examples_notebooks/api_overview/) for an easier example. As such, notebook needs to be running in the same GraphRAG environment/kernal.*

In [84]:
import subprocess
import shlex
from typing import Optional

def query_graphrag(
    query: str,
    method: str = "global",
    root_path: str = "./ragtest",
    timeout: Optional[int] = None,
    community_level: int = 2,
    dynamic_community_selection: bool = False
) -> str:
    """
    Execute a GraphRAG query using the CLI tool.
    
    Args:
        query (str): The query string to process
        method (str): Query method (e.g., "global", "local", or "drift")
        root_path (str): Path to the root directory
        timeout (int, optional): Timeout in seconds for the command
        community_level (int): The community level in the Leiden community hierarchy (default: 2)
        dynamic_community_selection (bool): Whether to use global search with dynamic community selection (default: False)
    
    Returns:
        str: The output from GraphRAG
        
    Raises:
        subprocess.CalledProcessError: If the command fails
        subprocess.TimeoutExpired: If the command times out
        ValueError: If community_level is negative
    """
    # Validate community level
    if community_level < 0:
        raise ValueError("Community level must be non-negative")
    
    # Construct the base command
    command = [
        'graphrag', 'query',
        '--root', root_path,
        '--method', method,
        '--query', query,
        '--community-level', str(community_level)
    ]
    
    # Add dynamic community selection flag if enabled
    if dynamic_community_selection:
        command.append('--dynamic-community-selection')
    
    try:
        # Execute the command and capture output
        result = subprocess.run(
            command,
            capture_output=True,
            text=True,
            timeout=timeout
        )
        
        # Check if the command was successful
        result.check_returncode()
        
        return result.stdout.strip()
        
    except subprocess.CalledProcessError as e:
        error_message = f"Command failed with exit code {e.returncode}\nError: {e.stderr}"
        raise subprocess.CalledProcessError(
            e.returncode,
            e.cmd,
            output=e.output,
            stderr=error_message
        )

### Local Search

<img src="./media/local_search.png" width=900>

The GraphRAG approach to local search is the most similar to regular semantic RAG search. It combines structured data from the knowledge graph with unstructured data from the input documents to augment the LLM context with relevant entity information. In essence, we are going to first search for relevant entities to the query using semantic search. These become the entry points on our graph that we can now traverse. Starting at these points, we look at connected chunks of text, community reports, other entities, and relationships between them. All of the data retrieved is filtered and ranked to fit into a pre-defined context window.

In [88]:
result = query_graphrag(
    query="How does a company choose between RAG, fine-tuning, and different PEFT approaches?",
    method="local"
)
print("Query result:")
print(result)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Query result:
INFO: Vector Store Args: {
    "type": "lancedb",
    "db_uri": "/Users/adamlucek/Desktop/github/GraphRAG/ragtest/output/lancedb",
    "container_name": "==== REDACTED ====",
    "overwrite": true
}
creating llm client with {'api_key': 'REDACTED,len=51', 'type': "openai_chat", 'encoding_model': 'cl100k_base', 'model': 'gpt-4o', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'request_timeout': 180.0, 'api_base': None, 'api_version': None, 'organization': None, 'proxy': None, 'audience': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25, 'responses': None}
creating embedding llm client with {'api_key': 'REDACTED,len=51', 'type': "openai_embedding", 'encoding_model': 'cl100k_base', 'model': 'text-embedding-3-small', 'max_tokens': 4000, 'tem

### Global Search

<img src="./media/global_search.png" width=1000>

Through the semantic clustering of communities during the indexxing process outlined above we created community reports as summaries of high level themes across these groupings. Having this community summary data at various levels allows us to do something that traditional RAG performs poorly at, answering queries about broad themes and ideas across our unstructured data.

To capture as much broad information as possible in an efficient manner, GraphRAG implements a [map reduce](https://en.wikipedia.org/wiki/MapReduce) approach. Given a query, relevant community node reports at a specific hierarchical level are retrieved. These are shuffled and chunked, where each chunk is used to generate a list of points that each have their own "importance score". These intermediate points are ranked and filtered, attempting to maintain the most important points. These become the aggregate intermediary response, which is passed to the LLM as the context for the final response.

In [86]:
result = query_graphrag(
    query="How does a company choose between RAG, fine-tuning, and different PEFT approaches?",
    method="global"
)
print("Query result:")
print(result)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Query result:
creating llm client with {'api_key': 'REDACTED,len=51', 'type': "openai_chat", 'encoding_model': 'cl100k_base', 'model': 'gpt-4o', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'request_timeout': 180.0, 'api_base': None, 'api_version': None, 'organization': None, 'proxy': None, 'audience': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25, 'responses': None}

SUCCESS: Global Search Response:
### Choosing Between RAG, Fine-Tuning, and PEFT Approaches

When a company is deciding between Retrieval-Augmented Generation (RAG), fine-tuning, and Parameter-Efficient Fine-Tuning (PEFT) approaches, several key factors must be considered. These factors include the specific requirements of the application, the need for external data integration, co

### DRIFT Search

<img src="./media/drift_search.png" width=1000>

[Dynamic Reasoning and Inference with Flexible Traversal](https://www.microsoft.com/en-us/research/blog/introducing-drift-search-combining-global-and-local-search-methods-to-improve-quality-and-efficiency/), or DRIFT, is a novel GraphRAG concept introduced by Microsoft as an approach to local search queries that include community information in the search process.

The user's query is initially processed through [Hypothetical Document Embedding (HyDE)](https://arxiv.org/pdf/2212.10496), which creates a hypothetical document similar to those found in the graph already, but using the user's topic query. This document is embedded and used for semantic retrieval of the top-k relevant community reports. From these matches, we generate an initial answer along with several follow-up questions as a lightweight version of global search. They refer to this as the primer.

Once this primer phase is complete, we execute local searches for each follow-up question generated. Each local search produces both intermediate answers and new follow-up questions, creating a refinement loop. This loop runs for two iterations (noted future research planned to develop reward functions for smarter termination). An important note that makes these local searches unique is that they are informed by both community-level knowledge and detailed entity/relationship data. This allows the DRIFT process to find relevant information even when the initial query diverges from the indexing persona, and it can adapt its approach based on emerging information during the search.

The final output is structured as a hierarchy of questions and answers, ranked by their relevance to the original query. Map reduce is used again with an equal weighting on all intermediate answers, then passed to the language model for a final response. DRIFT cleverly combines global and local search with guided exploration to provide both broad context and specific details in responses.

In [89]:
result = query_graphrag(
    query="How does a company choose between RAG, fine-tuning, and different PEFT approaches?",
    method="drift"
)
print("Query result:")
print(result)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Query result:
INFO: Vector Store Args: {
    "type": "lancedb",
    "db_uri": "/Users/adamlucek/Desktop/github/GraphRAG/ragtest/output/lancedb",
    "container_name": "==== REDACTED ====",
    "overwrite": true
}
creating llm client with {'api_key': 'REDACTED,len=51', 'type': "openai_chat", 'encoding_model': 'cl100k_base', 'model': 'gpt-4o', 'max_tokens': 4000, 'temperature': 0.0, 'top_p': 1.0, 'n': 1, 'frequency_penalty': 0.0, 'presence_penalty': 0.0, 'request_timeout': 180.0, 'api_base': None, 'api_version': None, 'organization': None, 'proxy': None, 'audience': None, 'deployment_name': None, 'model_supports_json': True, 'tokens_per_minute': 0, 'requests_per_minute': 0, 'max_retries': 10, 'max_retry_wait': 10.0, 'sleep_on_rate_limit_recommendation': True, 'concurrent_requests': 25, 'responses': None}
creating embedding llm client with {'api_key': 'REDACTED,len=51', 'type': "openai_embedding", 'encoding_model': 'cl100k_base', 'model': 'text-embedding-3-small', 'max_tokens': 4000, 'tem

---

## Comparing to Regular Vector Database Retrieval

<img src="./media/basic_retrieval.png" width=600>
 
To give some comparison, let's look back at traditional chunking, embedding, and similarity retrieval RAG

**Instantiate our Database**

For this we'll be using [ChromaDB](https://www.trychroma.com) with the same chunks as were loaded into our graph.

In [None]:
import chromadb

chroma_client = chromadb.PersistentClient(path="./notebook/chromadb")
paper_collection = chroma_client.get_or_create_collection(name="paper_collection")

**Embed Chunks Into Collection**

In [None]:
i = 0
for text in texts:
    paper_collection.add(
        documents=[text],
        ids=f"chunk_{i}"
    )
    i += 1

**Retrieval Function**

In [None]:
def chroma_retrieval(query, num_results=5):
    results = paper_collection.query(
        query_texts=[query],
        n_results=num_results
    )
    return results

**RAG Prompt & Chain**

In [90]:
rag_prompt_template = """
Generate a response of the target length and format that responds to the user's question, summarizing all information in the input data tables appropriate for the response length and format, and incorporating any relevant general knowledge.

If you don't know the answer, just say so. Do not make anything up.

Do not include information where the supporting evidence for it is not provided.

Context: {retrieved_docs}

User Question: {query}

"""

rag_prompt = ChatPromptTemplate.from_template(rag_prompt_template)

rag_chain = rag_prompt | llm | StrOutputParser()

**RAG Function**

In [91]:
def chroma_rag(query):
    retrieved_docs = chroma_retrieval(query)["documents"][0]
    response = rag_chain.invoke({"retrieved_docs": retrieved_docs, "query": query})
    return response

**RAG Response**

In [92]:
response = chroma_rag("How does a company choose between RAG, fine-tuning, and different PEFT approaches?")
print(response)

When choosing between Retrieval-Augmented Generation (RAG), fine-tuning, and different Parameter-Efficient Fine-Tuning (PEFT) approaches, a company should consider several factors:

1. **Data Access and Updates**: RAG is preferable for applications requiring access to external data sources or environments where data frequently updates. It provides dynamic data retrieval capabilities and is less prone to generating incorrect information.

2. **Model Behavior and Domain-Specific Knowledge**: Fine-tuning is suitable when the model needs to adjust its behavior, writing style, or incorporate domain-specific knowledge. It is effective if there is ample domain-specific, labeled training data available.

3. **Resource Constraints and Efficiency**: PEFT approaches like LoRA and DEFT are designed to reduce computational and resource requirements. LoRA focuses on low-rank matrices to reduce memory usage and computational load, while DEFT optimizes the fine-tuning process by focusing on the most c

---
## Discussion

**Traditional/Naive RAG:**

Benefits:
- Simpler implementation and deployment
- Works well for straightforward information retrieval tasks
- Good at handling unstructured text data
- Lower computational overhead

Drawbacks:
- Loses structural information when chunking documents
- Can break up related content during text segmentation
- Limited ability to capture relationships between different pieces of information
- May struggle with complex reasoning tasks requiring connecting multiple facts
- Potential for incomplete or fragmented answers due to chunking boundaries

**GraphRAG:**

Benefits:
- Preserves structural relationships and hierarchies in the knowledge
- Better at capturing connections between related information
- Can provide more complete and contextual answers
- Improved retrieval accuracy by leveraging graph structure
- Better supports complex reasoning across multiple facts
- Can maintain document coherence better than chunk-based approaches
- More interpretable due to explicit knowledge representation

Drawbacks:
- More complex to implement and maintain
- Requires additional processing to construct and update knowledge graphs
- Higher computational overhead for graph operations
- May require domain expertise to define graph schema/structure
- More challenging to scale to very large datasets
- Additional storage requirements for graph structure

**Key Differentiators:**
1. Knowledge Representation: Traditional RAG treats everything as flat text chunks, while GraphRAG maintains structured relationships in a graph format

2. Context Preservation: GraphRAG better preserves context and relationships between different pieces of information compared to the chunking approach of traditional RAG

3. Reasoning Capability: GraphRAG enables better multi-hop reasoning and connection of related facts through graph traversal, while traditional RAG is more limited to direct retrieval

4. Answer Quality: GraphRAG tends to produce more complete and coherent answers since it can access related information through graph connections rather than being limited by chunk boundaries

The choice between traditional RAG and GraphRAG often depends on the specific use case, with GraphRAG being particularly valuable when maintaining relationships between information is important or when complex reasoning is required. An important note as well, GraphRAG approaches still rely on regular embedding and retrieval methods themselves. They compliment eahcother!