# Graph RAG

"Graph RAG" has become an incredible buzz term of late. The goal of this notebook is to provide a simple and intuitive demonstration of what Graph RAG accomplishes, and how to use two open source, embedded databases (Kùzu, for graph traversal, and LanceDB, for vector search) to combine the benefits of vector and graph databases.

## What is Graph RAG?

At its core, Graph RAG aims to combine the power of knowledge graphs with the well-known benefits of vector (semantic) search. Knowledge graphs are great for storing and traversing relationships between entities, while vector embeddings are great for capturing the semantic similarity between chunks of data. By combining the two, we can create a powerful system that can answer complex queries that require both semantic similarity and relationship traversal.

## How and why does Graph RAG work in practice?

It's worth going over how and why Graph RAG makes sense, intuitively. Semantic search based on vector similarity leverages the _implicit_ relationships between entities - two vector embeddings that represent different chunks of text may be close to each other in vector space, indicating that they are semantically similar. On the other hand, knowledge graphs store _explicit_ relationships between entities - two nodes in a graph may be connected by an explicit relationship (termed an "edge"), indicating that they are related.

By combining the two, we can create a system that can answer complex queries that require both semantic similarity and relationship traversal. The code in this notebook demonstrates this.

## Part 1: Graph-only retrieval

First, let's demonstrate how to extract information into a knowledge graph and store it in [Kùzu](https://kuzudb.com/), an open source, embedded graph database.

In [1]:
import weave

weave.init("llamaindex_demo")

Logged in as Weights & Biases user: alonso-silva.
View Weave data at https://wandb.ai/alonso-silva/llamaindex_demo/weave


<weave.trace.weave_client.WeaveClient at 0x7482edf1ab40>

In [2]:
import os
import shutil
import warnings
from typing import List, Literal, Optional
from dotenv import load_dotenv
from llama_index.core import PropertyGraphIndex, Settings, SimpleDirectoryReader
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.llms.openai import OpenAI
from llama_index.core.indices.property_graph import SchemaLLMPathExtractor
from llama_index.graph_stores.kuzu import KuzuPropertyGraphStore

import kuzu

load_dotenv()

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
COHERE_API_KEY = os.environ.get("COHERE_API_KEY")

shutil.rmtree("test_kuzudb", ignore_errors=True)
db = kuzu.Database("test_kuzudb")

warnings.filterwarnings("ignore")

In [3]:
import nest_asyncio
nest_asyncio.apply()

In [4]:
# Set up the embedding model and LLM
embed_model = OpenAIEmbedding(model_name="text-embedding-3-small")
extraction_llm = OpenAI(model="gpt-4o-mini", temperature=0.0)
generation_llm = OpenAI(model="gpt-4o-mini", temperature=0.3)

documents = SimpleDirectoryReader("./data/curie/").load_data()

entities = Literal["PERSON", "NOBEL_PRIZE", "LOCATION", "DISCOVERY"]
relations = Literal["DISCOVERED", "IS_MARRIED_TO", "WORKED_WITH", "WON"]

# Define explicit relationship directions as a list of triples
# The graph extraction process will be guided by this "schema"
validation_schema = [
    ("PERSON", "IS_MARRIED_TO", "PERSON"),
    ("PERSON", "WORKED_WITH", "PERSON"),
    ("PERSON", "WON", "NOBEL_PRIZE"),
    ("PERSON", "DISCOVERED", "DISCOVERY"),
]

In [5]:
graph_store = KuzuPropertyGraphStore(
    db,
    has_structured_schema=True,
    relationship_schema=validation_schema,
)

In [6]:
schema_path_extractor = SchemaLLMPathExtractor(
    llm=extraction_llm,
    possible_entities=entities,
    possible_relations=relations,
    kg_validation_schema=validation_schema,
    strict=True,  # if false, will allow triples outside of the schema
)

In [7]:
# Set up the property graph index
kg_index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=embed_model,
    kg_extractors=[schema_path_extractor],
    property_graph_store=graph_store,
    show_progress=True,
)

Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 949.37it/s]
Extracting paths from text with schema: 100%|██████████| 1/1 [00:08<00:00,  8.02s/it]


🍩 https://wandb.ai/alonso-silva/llamaindex_demo/r/call/01921e20-28e3-7931-810f-b1f0054d80f6


Generating embeddings: 100%|██████████| 1/1 [00:00<00:00,  1.19it/s]
Generating embeddings: 100%|██████████| 1/1 [00:00<00:00,  1.43it/s]


Now that the graph is created, we can explore it in [Kùzu Explorer](https://github.com/kuzudb/explorer), a web-base UI, by running a Docker container that pulls the latest image of Kùzu Explorer as follows:

```bash
docker run -p 8000:8000 \
           -v ./test_kuzudb:/database \
           -e MODE=READ_ONLY \
           --rm kuzudb/explorer:latest
```

Then, launch the UI and then visting http://localhost:8000/.

The easiest way to see the entire graph is to use a Cypher query like `MATCH (a)-[b]->(c) RETURN * LIMIT 100`.

For this dataset, the graph constructed looks as follows:

![](./assets/kuzu_graph_rag.png)

The dataset is about the scientist Marie Curie and her discoveries, as well as her direct and indirect relationships to persons like Pierre Curie, Paul Langevin and Albert Einstein. The graph has an explicit schema, specified by us, and captures entities from the unstructured data like "Polonium", "Radium", and "Nobel Prize in Physics", etc., and edges representing relationships between these entities.

## Importance of graph quality

Graph construction is a critical step in the process of building a Graph RAG system. The quality of the graph will directly impact the quality of the results. In this notebook, we will use a simple example that leverages an LLM to demonstrate the idea. In practice, you would use more sophisticated methods to construct a knowledge graph, such as custom ML models or APIs (Rebel, GliNER/GliREL, DiffBot, WhyHow Knowledge Graph Studio, etc.).

The key is to _persist_ the graph in a graph database, so that it can be managed and queried efficiently. Kùzu is a great choice for this purpose, as it is an open source, embedded graph database that is easy to use and deploy.

The LLM-generated graph can be incomplete, noisy, or contain errors. It is important to clean and refine the graph before storing it in the database. This process is called "graph curation" and is essential for the quality of the results. The following cell performs the task of explicitly defining the graph and storing specific nodes and relationships in the already-existing knowledge graph, and persists it to the Kùzu database that sits on disk.

In [22]:
from llama_index.core.graph_stores.types import Relation, EntityNode

graph_store.upsert_nodes(
    [
        EntityNode(label="PERSON", name="Jacques Curie"),
    ]
)

graph_store.upsert_relations(
    [
        Relation(
            label="WORKED_WITH",
            source_id="Pierre Curie",
            target_id="Paul Langevin",
        ),
        Relation(
            label="DISCOVERED",
            source_id="Jacques Curie",
            target_id="piezoelectricity",
        ),
    ]
)

In [23]:
kg_index = PropertyGraphIndex.from_documents(
    documents,
    embed_model=embed_model,
    kg_extractors=[schema_path_extractor],
    property_graph_store=graph_store,
    show_progress=True,
)

Parsing nodes: 100%|██████████| 1/1 [00:00<00:00, 437.32it/s]
Extracting paths from text with schema: 100%|██████████| 1/1 [00:04<00:00,  4.71s/it]


🍩 https://wandb.ai/alonso-silva/llamaindex_demo/r/call/01921e33-b6b5-7f41-b6a3-6aa9941e7534


Generating embeddings: 100%|██████████| 1/1 [00:00<00:00,  1.08it/s]
Generating embeddings: 0it [00:00, ?it/s]


In [24]:
kg_retriever = kg_index.as_retriever()

In [25]:
kg_query_engine = kg_index.as_query_engine(include_text=True)

In [26]:
response = kg_query_engine.query("Who discovered Piezoelectricity?")

In [27]:
for i in range(len(response.source_nodes)):
    print(response.source_nodes[i].node.text)
    print()

Marie Curie ({'id': 'Marie Curie', 'text': None, 'label': 'PERSON', 'embedding': None, 'creation_date': datetime.date(2024, 9, 21), 'last_modified_date': datetime.date(2024, 9, 21), 'file_name': 'curie.txt', 'file_path': '/home/asilva/quarto/2024-PyData-Paris/kuzu-graph-rag/data/curie/curie.txt', 'file_size': 1830, 'file_type': 'text/plain', 'ref_doc_id': None, 'triplet_source_id': '21fca950-5574-4aa1-806c-ad7db40e3290'}) -> DISCOVERED -> radium ({'id': 'Marie Curie', 'text': None, 'label': 'PERSON', 'embedding': None, 'creation_date': datetime.date(2024, 9, 21), 'last_modified_date': datetime.date(2024, 9, 21), 'file_name': 'curie.txt', 'file_path': '/home/asilva/quarto/2024-PyData-Paris/kuzu-graph-rag/data/curie/curie.txt', 'file_size': 1830, 'file_type': 'text/plain', 'ref_doc_id': None, 'triplet_source_id': '21fca950-5574-4aa1-806c-ad7db40e3290'})

Marie Curie ({'id': 'Marie Curie', 'text': None, 'label': 'PERSON', 'embedding': None, 'creation_date': datetime.date(2024, 9, 21), 'la

In [21]:
response

Response(response='Jacques Curie discovered Piezoelectricity.', source_nodes=[NodeWithScore(node=TextNode(id_='912e722b-2bb8-45b8-8c22-afa99cf8d368', embedding=None, metadata={}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={<NodeRelationship.SOURCE: '1'>: RelatedNodeInfo(node_id='21fca950-5574-4aa1-806c-ad7db40e3290', node_type=None, metadata={}, hash=None)}, text="Marie Curie ({'id': 'Marie Curie', 'text': None, 'label': 'PERSON', 'embedding': None, 'creation_date': datetime.date(2024, 9, 21), 'last_modified_date': datetime.date(2024, 9, 21), 'file_name': 'curie.txt', 'file_path': '/home/asilva/quarto/2024-PyData-Paris/kuzu-graph-rag/data/curie/curie.txt', 'file_size': 1830, 'file_type': 'text/plain', 'ref_doc_id': None, 'triplet_source_id': '21fca950-5574-4aa1-806c-ad7db40e3290'}) -> DISCOVERED -> radium ({'id': 'Marie Curie', 'text': None, 'label': 'PERSON', 'embedding': None, 'creation_date': datetime.date(2024, 9, 21), 'last_modified_date': dateti

In [29]:
response.response

'Pierre Curie and Jacques Curie discovered piezoelectricity.'

In [28]:
print(str(response))

Pierre Curie and Jacques Curie discovered piezoelectricity.


In [30]:
response = kg_query_engine.query("Who did Pierre Curie work with?")
print(str(response))

Pierre Curie worked with Jacques Curie and Paul Langevin.


The two explicit relationships we are interested in are:
- Pierre Curie worked with his brother Jacques, to discover piezoelectricity.
- Paul Langevin was a student of Pierre Curie, which can be interpreted the same as a "worked with" relationship.

Explicitly modeling this and storing this in the graph allowed the information to be retrieved, providing the right context to the generation model downstream.

## Takeaways from graph-only retrieval

It can be seen by inspecting the raw data that the LLM-extracted graph is incomplete. Once the right nodes/relationships are added to the graph, the quality of the graph-based retrieval improves significantly. This did require some manual curation, but we will demonstrate below that this process is worth it, by trying to answer the **same** questions using vector-only retrieval.

## Part 2: Vector-only retrieval

This stage demonstrates how to extract information into a vector database and store it in [LanceDB](https://lancedb.com/), an open source, embedded vector database.

In [12]:
# We'll use LanceDB to perform vector similarity search
shutil.rmtree("./test_lancedb", ignore_errors=True)

from llama_index.core import StorageContext
from llama_index.core import VectorStoreIndex
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.vector_stores.lancedb import LanceDBVectorStore

import openai

openai.api_key = OPENAI_API_KEY

ModuleNotFoundError: No module named 'lance'

In [11]:
vector_store = LanceDBVectorStore(
    uri="./test_lancedb",
    mode="overwrite",
)
storage_context = StorageContext.from_defaults(vector_store=vector_store)

vector_index = VectorStoreIndex.from_documents(
    documents,
    storage_context=storage_context,
    embed_model=OpenAIEmbedding(model_name="text-embedding-3-small"),
    llm=OpenAI(model="gpt-4o-mini", temperature=0.3),
)

[2024-09-17T20:21:20Z WARN  lance::dataset] No existing dataset at /Users/prrao/code/kuzu-graph-rag/test_lancedb/vectors.lance, it will be created


In [12]:
vector_retriever = vector_index.as_retriever(similarity_top_k=4)
vector_query_engine = RetrieverQueryEngine(vector_retriever)

response = vector_query_engine.query("Who discovered Piezoelectricty?")
str(response)

'Pierre Curie discovered piezoelectricity.'

In [13]:
response = vector_query_engine.query("Who did Pierre Curie work with?")
str(response)

'Pierre Curie worked with his brother Jacques in discovering piezoelectricity.'

The implicit relationship "was a student of" isn't close enough to "worked with" in vector space. This leads the vector search to miss the relationship between Pierre Curie and Paul Langevin (who was his student, meaning that they worked togethrer). Using the graph as shown earlier, we were able to explicitly define and capture this relationship, allowing the graph-based retrieval to provide the generation model with a slightly better context.

## Takeaways from vector-only retrieval

Due to the nature of the data and the questions being asked, the vector-only retrieval obtains _partial_ answers to the questions. This is because the vector embeddings are not able to capture the deeper relationships between the entities in the text. This is where the graph-based retrieval provides value, as it can capture these relationships and provide more accurate answers.

However, this is **not** to say that graph retrieval is better than vector retrieval - in many cases, semantic similarity can help narrow down the search space and provide useful insights. The aim of Graph RAG is to combine the two methods, which we will demonstrate in the next section.

## Stage 3: Combining graph and vector retrieval to build a Graph RAG system

In this stage, we will demonstrate how to combine graph and vector retrieval and rerank the results in order to provide better context to the LLM prior to generating the response. We will use the afore-mentioned Kùzu and LanceDB databases to achieve this.

In [14]:
from llama_index.core.retrievers import BaseRetriever
from llama_index.core.schema import NodeWithScore
from llama_index.postprocessor.cohere_rerank.base import CohereRerank


class CustomRerankerRetriever(BaseRetriever):
    """Custom retriever with cohere reranking."""
    def __init__(
            self,
            kg_retriever,
            vector_retriever,
            cohere_api_key: Optional[str] = None,
            cohere_top_n: int = 2,
        ):
        self._kg_retriever = kg_retriever
        self._vector_retriever = vector_retriever
        self._reranker = CohereRerank(
            api_key=cohere_api_key, top_n=cohere_top_n
        )

    def _retrieve(self, query: str) -> List[NodeWithScore]:
        """Define custom retriever with reranking.

        Could return `str`, `TextNode`, `NodeWithScore`, or a list of those.
        """
        vector_retrieval_nodes = self._vector_retriever.retrieve(query)
        kg_retrieval_nodes = self._kg_retriever.retrieve(query)
        combined_nodes = vector_retrieval_nodes + kg_retrieval_nodes
        reranked_nodes = self._reranker.postprocess_nodes(
            combined_nodes,
            query_str=str(query),
        )
        unique_nodes = {n.node_id: n for n in reranked_nodes}
        return list(unique_nodes.values())

In [15]:
custom_reranker_retriever = CustomRerankerRetriever(
    kg_retriever,
    vector_retriever,
    cohere_api_key=COHERE_API_KEY,
    cohere_top_n=2,
)

In [16]:
# Set the LLM for generation in the CustomRerankerRetriever
Settings.llm = generation_llm

custom_reranker_query_engine = RetrieverQueryEngine(custom_reranker_retriever)

response = custom_reranker_query_engine.query("Who did Pierre Curie work with?")
print(str(response))

Pierre Curie worked with Paul Langevin and Jacques Curie.


The custom retriever was able to use the context from both the graph and the vector retrievals to provide the correct answer to the question - Paul Langevin was Pierre Curie's student as per the raw text, but the knowledge graph explicitly stored this via the relationship `(:PERSON {name: "Paul Langevin"})-[:WORKED_WITH]->(:PERSON {name: "PierreCurie"})`, which the reranker retriever was able to leverage from the given context.

## Takeaways from a Graph RAG perspective

As can be seen from the above example, combining graph and vector retrieval can provide more accurate and contextually relevant answers to the questions. This is because the graph-based retrieval can capture the relationships between the entities in the text, while the vector-based retrieval can provide semantic similarity between the entities. By combining the two, we can build a powerful system that can answer complex queries that require both semantic similarity and relationship traversal, with a reranker retriever that can leverage the context from both retrievals to provide a more relevant answer.

In practice, the Graph RAG system can be used to answer a wide range of questions, such as factual questions, definition questions, and reasoning questions. The key is to build a high-quality knowledge graph, and to combine it with vector search in a way that provides the most relevant and accurate answers to the questions.

## Conclusions

Graph RAG can be thought of as a suite of methodologies that combine the power of knowledge graphs with the benefits of vector search. Databases like Kùzu and LanceDB, due to their ease of use, developer friendliness and permissive licensing, are great choices for building a Graph RAG system.