# Knowledge Graph Creation

Create a knowledge graph from an unstructured Markdown file.

Follow this [official documentation page](https://python.langchain.com/docs/integrations/graphs/memgraph/) from Langchain with Memgraph integration.

## Generating Entities & Replationships

In [1]:
def load_file_content(file_path: str) -> str:
    with open(file_path, 'r', encoding='utf-8') as f:
        return f.read()

In [2]:
file_path = '../files/elden_ring.md'
file_content = load_file_content(file_path)

In [3]:
from langchain_ollama import ChatOllama

llm = ChatOllama(
    model='qwen2.5:7b',
    base_url='http://localhost:11435',
    format='json',
)

In [4]:
# verify LLM
print(llm.invoke('Who are you? Respond with a JSON object and put the answer in the `response` key.').content)

{
  "response": "I am Qwen, a large language model created by Alibaba Cloud."
}


The following prompt template originates from [LangChain's `LLMGraphTransformer`](https://github.com/langchain-ai/langchain-experimental/blob/5cdbf02e3771da35d1438bb7597be4575610684e/libs/experimental/langchain_experimental/graph_transformers/llm.py#L110).

In [5]:
# general_instructions = """
# You are a top-tier algorithm designed for extracting information in structured formats to build a knowledge graph.
# """
# Try to capture as much information from the text as possible without sacrificing accuracy. Do not add any information that is not explicitly mentioned in the text.
# The aim is to achieve simplicity and clarity in the knowledge graph, making it accessible for a vast audience.
# Nodes represent entities and concepts. Ensure you use available types for node labels. Ensure you use basic/elementary types for node labels.
# For example, when you identify an entity representing a person, always label it as 'person'. Avoid using more specific terms like 'mathematician' or 'scientist'.
# Never utilize integers as node IDs. Node IDs should be names or human-readable identifiers found in the text.
# Relationships represent connections between entities or concepts.
# Ensure consistency and generality in relationship types when constructing knowledge graphs. Instead of using specific and momentary types such as 'BECAME_PROFESSOR', use more general and timeless relationship types like 'PROFESSOR'. Make sure to use general and timeless relationship types!
# When extracting entities, it's vital to ensure consistency.
# If an entity, such as "John Doe", is mentioned multiple times in the text but is referred to by different names or pronouns (e.g., "Joe", "he"), always use the most complete identifier for that entity throughout the knowledge graph. In this example, use "John Doe" as the entity ID.
# Remember, the knowledge graph should be coherent and easily understandable, so maintaining consistency in entity references is crucial.
# Adhere to the rules strictly. Non-compliance will result in termination.

In [6]:
output_example = {
    'nodes': [
        {
            'id': 'John Doe',
            'type': 'Person',
            'properties': {'profession': 'scientist'}
        }
    ],
    'relationships': [
        {
            'source': 'John Doe',
            'target': 'Introduction to AI',
            'type': 'READ',
            'properties': {'time': 'August 2025'}
        }
    ]
}

In [7]:
output_spec = """
Output Example:
{output_example}
"""

In [8]:
question_template = """
Current Task: You are a top-tier algorithm designed for extracting information in structured formats to build a knowledge graph.
Tip: Make sure to answer in the correct format and do not include any explanations.
Use the given format to extract information from the following input:
{input}
"""

In [9]:
from langchain_core.prompts import PromptTemplate


kg_creation_prompt_template = ''.join([
    # general_instructions,
    output_spec,
    question_template,
])
kg_creation_prompt = PromptTemplate.from_template(kg_creation_prompt_template)
print(kg_creation_prompt.template)


Output Example:
{output_example}

Current Task: You are a top-tier algorithm designed for extracting information in structured formats to build a knowledge graph.
Tip: Make sure to answer in the correct format and do not include any explanations.
Use the given format to extract information from the following input:
{input}



In [10]:
import json
final_prompt = kg_creation_prompt.format(input=file_content, output_example=json.dumps(output_example))
print(final_prompt)


Output Example:
{"nodes": [{"id": "John Doe", "type": "Person", "properties": {"profession": "scientist"}}], "relationships": [{"source": "John Doe", "target": "Introduction to AI", "type": "READ", "properties": {"time": "August 2025"}}]}

Current Task: You are a top-tier algorithm designed for extracting information in structured formats to build a knowledge graph.
Tip: Make sure to answer in the correct format and do not include any explanations.
Use the given format to extract information from the following input:
# Elden Ring: A Comprehensive Guide

**Elden Ring** is an action RPG developed by FromSoftware and published by Bandai Namco Entertainment. Directed by Hidetaka Miyazaki, in collaboration with George R. R. Martin, the game blends the intricate lore and challenging gameplay FromSoftware is renowned for with a vast, open-world experience.

---

## Story and Setting

**Elden Ring** is set in the mystical world of the Lands Between, where the Elden Ring, a powerful artifact, 

In [11]:
res = llm.invoke(final_prompt)

In [12]:
import json
from pprint import pprint

res_json = json.loads(res.content)
pprint(res_json)

{'nodes': [{'id': 'Elden Ring',
            'properties': {'developer': 'FromSoftware',
                           'director': 'Hidetaka Miyazaki, George R. R. Martin',
                           'publisher': 'Bandai Namco Entertainment'},
            'type': 'Game'},
           {'id': 'Limgrave', 'properties': {}, 'type': 'Region'},
           {'id': 'Liurnia of the Lakes', 'properties': {}, 'type': 'Region'},
           {'id': 'Caelid', 'properties': {}, 'type': 'Region'},
           {'id': 'The Chapel of Anticipation',
            'properties': {},
            'type': 'Location'},
           {'id': 'Stormveil Castle', 'properties': {}, 'type': 'Location'},
           {'id': 'Raya Lucaria Academy',
            'properties': {},
            'type': 'Location'}],
 'relationships': [{'properties': {},
                    'source': 'Elden Ring',
                    'target': 'Limgrave',
                    'type': 'SET_IN'},
                   {'properties': {},
                    'sour

Finally, we need to parse the JSON dict into graph documents that can be processed by pre-defined LangChain Memgraph integrations.

In [13]:
from langchain_memgraph.graphs.graph_document import Node, Relationship, GraphDocument

nodes = {node['id']: Node(**node) for node in res_json['nodes']}
relationships = [Relationship(
    source=nodes[rel['source']],
    target=nodes[rel['target']],
    type=rel['type']
    # TODO: relationship
) for rel in res_json['relationships']]
graph_documents = [GraphDocument(nodes=nodes.values(), relationships=relationships)]

## Adding Generated Results to Memgraph

**NOTE**: Make sure Memgraph is running.

Memgraph Lab is running at http://localhost:3000/ openable from browser.

Memgraph MAGE server is running at http://localhost:7687/.

In [14]:
url = 'bolt://localhost:7687'
username = ''
password = ''

In [15]:
from langchain_memgraph.graphs.memgraph import Memgraph

graph = Memgraph(url=url, username=username, password=password, refresh_schema=False)

In [16]:
# Empty the database
graph.query("STORAGE MODE IN_MEMORY_ANALYTICAL")
graph.query("DROP GRAPH")
graph.query("STORAGE MODE IN_MEMORY_TRANSACTIONAL")

[]

In [17]:
# Create KG
graph.add_graph_documents(graph_documents)

Remember to shutdown the Memgraph servers after testing, if necessary.