# Using GraphRAG-SDK to Create a Knowledge Graph and RAG System from Unstructured Documents

GraphRAG-SDK provides a powerful tool, enhanced by LLM technology, to build a Retrieval-Augmented Generation (RAG) system. This example demonstrates how to load UFC HTML files, automatically detect ontology based on 10% of the files, and create a Knowledge Graph (KG) to enable a question-answerable RAG system.

In [25]:
!pip install graphrag_sdk



In [16]:
!sudo apt-get update
!sudo apt-get install redis-server

0% [Working]            Hit:1 http://archive.ubuntu.com/ubuntu jammy InRelease
0% [Waiting for headers] [Waiting for headers] [Connected to cloud.r-project.or                                                                               Get:2 http://archive.ubuntu.com/ubuntu jammy-updates InRelease [128 kB]
0% [2 InRelease 47.5 kB/128 kB 37%] [Waiting for headers] [Waiting for headers]                                                                               Get:3 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
0% [2 InRelease 79.3 kB/128 kB 62%] [Waiting for headers] [3 InRelease 3,626 B/0% [2 InRelease 79.3 kB/128 kB 62%] [Waiting for headers] [Connected to r2u.sta0% [Waiting for headers] [Waiting for headers] [Waiting for headers] [Waiting f                                                                               Get:4 http://archive.ubuntu.com/ubuntu jammy-backports InRelease [127 kB]
0% [4 InRelease 5,484 B/127 kB 4%] [Waiting

In [17]:
!sudo /etc/init.d/redis-server start

Starting redis-server: redis-server.


In [6]:
import os
import json
import random
from dotenv import load_dotenv
from graphrag_sdk.source import Source
from graphrag_sdk import KnowledgeGraph, Ontology
from graphrag_sdk.models.gemini import GeminiGenerativeModel
from graphrag_sdk.model_config import KnowledgeGraphModelConfig

# Load environment variables
load_dotenv()

# Configuration
os.environ['GOOGLE_API_KEY'] = "AIzaSyC5dotVVs2S4Mv8xpI4cttb0PAK6tIXsGA"# OpenAI API key


### Import Source Data from Disk

This example uses UFC HTML files as the source data. We will import these files as `Source` objects.

In [26]:
# Data folder.
src_files = "./input"
sources = []

# For each file in the source directory, create a new Source object.
for file in os.listdir(src_files):
    sources.append(Source(os.path.join(src_files, file)))
print(f"Loaded {len(sources)} sources.")

Loaded 1 sources.


### Ontology from the Sources

Next, we will utilize an LLM to automatically extract ontology from a portion of the data (10%) and save it as a JSON file for manual review. We will also add `boundaries` to the ontology detection process to ensure the desired ontology is accurately identified.

In [13]:
# Define the percentage of files that will be used to auto-create the ontology.
# Ensure at least 1 source is selected even with small percentage
percent = 0.1  # Adjust as needed, but consider the dataset size

boundaries = """
    Extract only the most relevant information about UFC fighters, fights, and events.
    Avoid creating entities for details that can be expressed as attributes.
"""

# Define the model to be used for the ontology
model = GeminiGenerativeModel(model_name="gemini-1.5-pro-latest")

# Randomly select a percentage of files from sources,
# but ensure at least one source is selected
num_samples = max(1, round(len(sources) * percent))  # Ensure at least 1 sample
sampled_sources = random.sample(sources, num_samples)

ontology = Ontology.from_sources(
    sources=sampled_sources,
    boundaries=boundaries,
    model=model,
)

# Save the ontology to the disk as a json file.
with open("ontology.json", "w", encoding="utf-8") as file:
    file.write(json.dumps(ontology.to_json(), indent=2))

ERROR:tornado.access:503 POST /v1beta/models/gemini-1.5-pro-latest:generateContent?%24alt=json%3Benum-encoding%3Dint (127.0.0.1) 6046.76ms
DEBUG:graphrag_sdk.steps.create_ontology_step:Model response: GenerationResponse(text={"entities":[{"label":"Loan","attributes":[{"name":"loan_type","type":"string","unique":true,"required":true},{"name":"purpose","type":"string","unique":false,"required":true},{"name":"interest_rate_excellent","type":"string","unique":false,"required":false},{"name":"interest_rate_good","type":"string","unique":false,"required":false},{"name":"interest_rate_fair","type":"string","unique":false,"required":false},{"name":"interest_rate_poor","type":"string","unique":false,"required":false},{"name":"interest_rate_very_poor","type":"string","unique":false,"required":false}]},{"label":"Website","attributes":[{"name":"name","type":"string","unique":true,"required":true}]},{"label":"FinancialInstitution","attributes":[{"name":"name","type":"string","unique":true,"required

Review the Ontology

In [14]:
print(json.dumps(ontology.to_json(), indent=4))

{
    "entities": [
        {
            "label": "Loan",
            "attributes": [
                {
                    "name": "loan_type",
                    "type": "string",
                    "unique": true,
                    "required": true
                },
                {
                    "name": "purpose",
                    "type": "string",
                    "unique": false,
                    "required": true
                },
                {
                    "name": "interest_rate_excellent",
                    "type": "string",
                    "unique": false,
                    "required": false
                },
                {
                    "name": "interest_rate_good",
                    "type": "string",
                    "unique": false,
                    "required": false
                },
                {
                    "name": "interest_rate_fair",
                    "type": "string",
                    "uniq

### KG from Sources and Ontology

After reviewing the ontology, we will load it and use it to create a Knowledge Graph (KG) from the sources.

In [18]:
# After approving the ontology, load it from disk.
ontology_file = "ontology.json"
with open(ontology_file, "r", encoding="utf-8") as file:
    ontology = Ontology.from_json(json.loads(file.read()))

kg = KnowledgeGraph(
    name="ufc",
    model_config=KnowledgeGraphModelConfig.with_model(model),
    ontology=ontology,
)
kg.process_sources(sources)

DEBUG:graphrag_sdk.steps.extract_data_step:Processing 1 documents
DEBUG:graphrag_sdk.steps.extract_data_step:Processing task: extract_data_step_6b8b132d-1650-4a34-98d0-e43a3f117e88
DEBUG:extract_data_step_6b8b132d-1650-4a34-98d0-e43a3f117e88:Processing task: extract_data_step_6b8b132d-1650-4a34-98d0-e43a3f117e88
DEBUG:extract_data_step_6b8b132d-1650-4a34-98d0-e43a3f117e88:User message:  You are tasked with extracting entities and relations from the text below, using the ontology provided.  **Output Format:**  - Provide the extracted data as a JSON object with two keys: `"entities"` and `"relations"`.  - **Entities**: Represent entities and concepts. Each entity should have a `"label"` and `"attributes"` field.  - **Relations**: Represent relations between entities or concepts. Each relation should have a `"label"`, `"source"`, `"target"`, and `"attributes"` field.  **Guidelines:** - **Extract all entities and relations**: Capture all entities and relations mentioned in the text.  - **U

### Graph RAG

At this point, we have a Knowledge Graph based on our data, and we can use it in our GraphRAG system. Utilize the method `chat_session` method for starting a conversation.

In [19]:
# Conversation.
chat = kg.chat_session()
response = chat.send_message("What are the loans avialable?")
# print(response)
# response = chat.send_message("Tell me about one of his fights?")
print(response)

DEBUG:graphrag_sdk.steps.graph_query_step:Cypher Prompt: 
Using the ontology provided, generate an OpenCypher statement to query the graph database returning all relevant entities, relationships, and attributes to answer the question below:
If you cannot generate a OpenCypher statement for any reason, return an empty string.
Respect the order of the relationships, the arrows should always point from the "source" to the "target".

Question: What are the loans avialable?

DEBUG:graphrag_sdk.steps.graph_query_step:Cypher Statement Response: GenerationResponse(text=```cypher
MATCH (l:Loan)
RETURN l
```
, finish_reason=STOP)
DEBUG:graphrag_sdk.steps.graph_query_step:Cypher: 
MATCH (l:Loan)
RETURN l

DEBUG:graphrag_sdk.steps.graph_query_step:Error: unknown command `GRAPH.QUERY`, with args beginning with: `ufc`, ` MATCH (l:Loan) RETURN l `, `--compact`, 
DEBUG:graphrag_sdk.steps.graph_query_step:Cypher Prompt: 
The Cypher statement above failed with the following error:
"unknown command `GRAP

Exception: Failed to generate Cypher query: unknown command `GRAPH.QUERY`, with args beginning with: `ufc`, ` MATCH (l:Loan) RETURN l `, `--compact`, 

In [20]:
!pip install neo4j

Collecting neo4j
  Downloading neo4j-5.25.0-py3-none-any.whl.metadata (5.7 kB)
Downloading neo4j-5.25.0-py3-none-any.whl (296 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m296.6/296.6 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: neo4j
Successfully installed neo4j-5.25.0


In [24]:
from neo4j import GraphDatabase
from graphrag_sdk import KnowledgeGraph, KnowledgeGraphModelConfig, Ontology

# ... (your existing code to load the ontology and model) ...

# Initialize the Neo4j driver
driver = GraphDatabase.driver("neo4j+s://acc0740c.databases.neo4j.io", auth=("neo4j", "EqXiTPsdJGYo7cBu6PY66dQ380P5mQoQgcaY3Bub7a0")) # Replace with your Neo4j credentials

kg = KnowledgeGraph(
    name="ufc",
    model_config=KnowledgeGraphModelConfig.with_model(model),
    ontology=ontology,
    # driver=driver # Pass the driver to the KnowledgeGraph
)

# with driver.session() as session:
#     session.run("CREATE (n:Person {name: 'Alice'})")

kg.process_sources(sources)


DEBUG:graphrag_sdk.steps.extract_data_step:Processing 1 documents
DEBUG:graphrag_sdk.steps.extract_data_step:Processing task: extract_data_step_74b026a6-3b09-4417-8725-426ac7376d67
DEBUG:extract_data_step_74b026a6-3b09-4417-8725-426ac7376d67:Processing task: extract_data_step_74b026a6-3b09-4417-8725-426ac7376d67
DEBUG:extract_data_step_74b026a6-3b09-4417-8725-426ac7376d67:User message:  You are tasked with extracting entities and relations from the text below, using the ontology provided.  **Output Format:**  - Provide the extracted data as a JSON object with two keys: `"entities"` and `"relations"`.  - **Entities**: Represent entities and concepts. Each entity should have a `"label"` and `"attributes"` field.  - **Relations**: Represent relations between entities or concepts. Each relation should have a `"label"`, `"source"`, `"target"`, and `"attributes"` field.  **Guidelines:** - **Extract all entities and relations**: Capture all entities and relations mentioned in the text.  - **U

In [28]:
chat = kg.chat_session()
response = chat.send_message("What are the loans avialable?")

DEBUG:graphrag_sdk.steps.graph_query_step:Cypher Prompt: 
Using the ontology provided, generate an OpenCypher statement to query the graph database returning all relevant entities, relationships, and attributes to answer the question below:
If you cannot generate a OpenCypher statement for any reason, return an empty string.
Respect the order of the relationships, the arrows should always point from the "source" to the "target".

Question: What are the loans avialable?

DEBUG:graphrag_sdk.steps.graph_query_step:Cypher Statement Response: GenerationResponse(text=```cypher
MATCH (l:Loan)
RETURN l
```
, finish_reason=STOP)
DEBUG:graphrag_sdk.steps.graph_query_step:Cypher: 
MATCH (l:Loan)
RETURN l

DEBUG:graphrag_sdk.steps.graph_query_step:Error: unknown command `GRAPH.QUERY`, with args beginning with: `ufc`, ` MATCH (l:Loan) RETURN l `, `--compact`, 
DEBUG:graphrag_sdk.steps.graph_query_step:Cypher Prompt: 
The Cypher statement above failed with the following error:
"unknown command `GRAP

Exception: Failed to generate Cypher query: 429 POST https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-latest:generateContent?%24alt=json%3Benum-encoding%3Dint: Resource has been exhausted (e.g. check quota).

In [31]:
import functools

@functools.lru_cache(maxsize=128)  # Cache up to 128 responses
def send_message_with_cache(message):
    return chat.send_message(message)

# Example usage
response = send_message_with_cache("What are the loans avialable?")
print(response)

DEBUG:graphrag_sdk.steps.graph_query_step:Cypher Prompt: 
Using the ontology provided, generate an OpenCypher statement to query the graph database returning all relevant entities, relationships, and attributes to answer the question below:
If you cannot generate a OpenCypher statement for any reason, return an empty string.
Respect the order of the relationships, the arrows should always point from the "source" to the "target".

Question: What are the loans avialable?

DEBUG:graphrag_sdk.steps.graph_query_step:Cypher Statement Response: GenerationResponse(text=```cypher
MATCH (l:Loan)
RETURN l
```
, finish_reason=STOP)
DEBUG:graphrag_sdk.steps.graph_query_step:Cypher: 
MATCH (l:Loan)
RETURN l

DEBUG:graphrag_sdk.steps.graph_query_step:Error: unknown command `GRAPH.QUERY`, with args beginning with: `ufc`, ` MATCH (l:Loan) RETURN l `, `--compact`, 
DEBUG:graphrag_sdk.steps.graph_query_step:Cypher Prompt: 
The Cypher statement above failed with the following error:
"unknown command `GRAP

Exception: Failed to generate Cypher query: unknown command `GRAPH.QUERY`, with args beginning with: `ufc`, ` MATCH (l:Loan) RETURN l `, `--compact`, 