# Using GraphRAG-SDK to Create a Knowledge Graph and RAG System from Unstructured Documents

GraphRAG-SDK provides a powerful tool, enhanced by LLM technology, to build a Retrieval-Augmented Generation (RAG) system. This example demonstrates how to load UFC HTML files, automatically detect ontology based on 10% of the files, and create a Knowledge Graph (KG) to enable a question-answerable RAG system.

In [15]:
import os
import json
import random
from falkordb import FalkorDB
from dotenv import load_dotenv
from graphrag_sdk.classes.source import Source
from graphrag_sdk import KnowledgeGraph, Ontology
from graphrag_sdk.models.openai import OpenAiGenerativeModel
from graphrag_sdk.classes.model_config import KnowledgeGraphModelConfig
load_dotenv()

# Configuration
# OPENAI_API_KEY = "sk-"# OpenAI API key

True

### Import Source Data from Disk

This example uses UFC HTML files as the source data. We will import these files as `Source` objects.

In [16]:
# Data folder.
src_files = "data/fight"
sources = []

# For each file in the source directory, create a new Source object.
for file in os.listdir(src_files):
    sources.append(Source(os.path.join(src_files, file)))

### Ontology from the Sources

Next, we will utilize an LLM to automatically extract ontology from a portion of the data (10%) and save it as a JSON file for manual review. We will also add `boundaries` to the ontology detection process to ensure the desired ontology is accurately identified.

In [17]:
# Define the percentage of files that will be used to auto-create the ontology.
percent = 0.1  # This represents 10%. You can adjust this value (e.g., 0.2 for 20%).

boundaries = """
    Extract only the most relevant information about UFC fighters, fights, and events.
    Avoid creating entities for details that can be expressed as attributes.
"""

# Define the model to be used for the ontology
model = OpenAiGenerativeModel(model_name="gpt-4o")

# Randomly select a percentage of files from sources.
sampled_sources = random.sample(sources, round(len(sources) * percent))

ontology = Ontology.from_sources(
    sources=sampled_sources,
    boundaries=boundaries,
    model=model,
)

# Save the ontology to the disk as a json file.
with open("ontology.json", "w", encoding="utf-8") as file:
    file.write(json.dumps(ontology.to_json(), indent=2))

In [18]:
print(ontology.to_json())

{'entities': [{'label': 'Person', 'attributes': [{'name': 'name', 'type': 'string', 'unique': True, 'required': True}, {'name': 'nickname', 'type': 'string', 'unique': False, 'required': False}, {'name': 'result', 'type': 'string', 'unique': False, 'required': False}], 'description': ''}, {'label': 'Event', 'attributes': [{'name': 'title', 'type': 'string', 'unique': True, 'required': True}, {'name': 'date', 'type': 'string', 'unique': False, 'required': True}, {'name': 'location', 'type': 'string', 'unique': False, 'required': True}], 'description': ''}, {'label': 'Referee', 'attributes': [{'name': 'name', 'type': 'string', 'unique': True, 'required': True}], 'description': ''}, {'label': 'FightStat', 'attributes': [{'name': 'fighter_name', 'type': 'string', 'unique': True, 'required': True}, {'name': 'knockdowns', 'type': 'number', 'unique': False, 'required': True}, {'name': 'significant_strikes', 'type': 'number', 'unique': False, 'required': True}, {'name': 'total_strikes', 'type'

### KG from Sources and Ontology

After reviewing the ontology, we will load it and use it to create a Knowledge Graph (KG) from the sources.

In [19]:
# After approving the ontology, load it from disk.
ontology_file = "ontology.json"
with open(ontology_file, "r", encoding="utf-8") as file:
    ontology = Ontology.from_json(json.loads(file.read()))

kg = KnowledgeGraph(
    name="ufc",
    model_config=KnowledgeGraphModelConfig.with_model(model),
    ontology=ontology,
)
kg.process_sources(sources)

Entity with label FightStatistic not found in ontology
Entity with label FightStatistic not found in ontology


### Graph RAG

At this point, we have a Knowledge Graph based on our data, and we can use it in our GraphRAG system. We can utilize the `ask` method for single questions or the `chat_session` method for engaging in conversations.

In [20]:
# Single question.
response = kg.ask("What were the last five fights? When were they? How many rounds did they have?")
print(response)
# Conversation.
chat = kg.chat_session()
response = chat.send_message("Who is Salsa Boy?")
print(response)
response = chat.send_message("How many fights has he participated?")
print(response)

('The last five fights were all part of "UFC Fight Night: Lewis vs. Nascimento" held on May 11, 2024, in St. Louis, Missouri, USA. Each fight lasted for 3 rounds.', <graphrag_sdk.models.openai.OpenAiChatSession object at 0x75501cf6b510>)
Salsa Boy is Waldo Cortes-Acosta.
Waldo Cortes-Acosta has participated in 12 fights.
