# A gentle introduction to the HILDEGARD workflow

Notebook Author: *Cosimo Palma*  
cosimo.palma@phd.unipi.it

HILDEGARD (acronym for "Human In the Loop Data Extraction and Graphically Augmented Relation Discovery") is a Digital Heritage Management Tool aiming at retrieving relationships between Heritage Objects conserved in museums. The following functions makes up a "lightweight" version of HILDEGARD tailored for Digital Historians, where the Digital Heritage LOD (Linked Open Data) block of the pipeline is not included for brevity's sake. It creates a Knowledge Graph based on two seed-Wikipedia entities and saves it in a .csv file that can be easily stored and queried in a kuzu knowledge base.

First of all, we install all necessary modules to build the Web-Scraping, the KG-relationships finder, and the querable knowledge base.

In [None]:
!pip install kuzu requests selenium beautifulsoup4 SPARQLWrapper


Collecting kuzu
  Downloading kuzu-0.6.1-cp310-cp310-manylinux_2_27_x86_64.manylinux_2_28_x86_64.whl.metadata (5.6 kB)
Collecting selenium
  Downloading selenium-4.26.1-py3-none-any.whl.metadata (7.1 kB)
Collecting SPARQLWrapper
  Downloading SPARQLWrapper-2.0.0-py3-none-any.whl.metadata (2.0 kB)
Collecting trio~=0.17 (from selenium)
  Downloading trio-0.27.0-py3-none-any.whl.metadata (8.6 kB)
Collecting trio-websocket~=0.9 (from selenium)
  Downloading trio_websocket-0.11.1-py3-none-any.whl.metadata (4.7 kB)
Collecting rdflib>=6.1.1 (from SPARQLWrapper)
  Downloading rdflib-7.1.1-py3-none-any.whl.metadata (11 kB)
Collecting isodate<1.0.0,>=0.7.2 (from rdflib>=6.1.1->SPARQLWrapper)
  Downloading isodate-0.7.2-py3-none-any.whl.metadata (11 kB)
Collecting sortedcontainers (from trio~=0.17->selenium)
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl.metadata (10 kB)
Collecting outcome (from trio~=0.17->selenium)
  Downloading outcome-1.3.0.post0-py2.py3-none-any.whl.metadata (2.6 

Through this codelet the chrome-driver for using the WebScraper is installed.

In [None]:
!apt-get update
!apt-get install -y chromium-chromedriver
!cp /usr/lib/chromium-browser/chromedriver /usr/bin


0% [Working]            Get:1 http://security.ubuntu.com/ubuntu jammy-security InRelease [129 kB]
0% [Connecting to archive.ubuntu.com] [1 InRelease 14.2 kB/129 kB 11%] [Connected to cloud.r-project                                                                                                    Get:2 https://cloud.r-project.org/bin/linux/ubuntu jammy-cran40/ InRelease [3,626 B]
0% [Connecting to archive.ubuntu.com (91.189.91.83)] [1 InRelease 51.8 kB/129 kB 40%] [2 InRelease 00% [Connecting to archive.ubuntu.com (91.189.91.83)] [1 InRelease 73.5 kB/129 kB 57%] [Connected to                                                                                                     Hit:3 http://archive.ubuntu.com/ubuntu jammy InRelease
0% [Waiting for headers] [1 InRelease 129 kB/129 kB 100%] [Waiting for headers] [Waiting for headers0% [Waiting for headers] [Waiting for headers] [Waiting for headers] [Connecting to ppa.launchpadcon                                                 

*Input entities validation*

In [None]:
import requests
import json
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
from SPARQLWrapper import SPARQLWrapper, JSON

def validate_entity(entity):
    """Validate if a Wikipedia entity exists."""
    response = requests.get(f"https://en.wikipedia.org/wiki/{entity}")
    return response.status_code == 200

# Take user input and validate
entity_start = input("Enter the starting Wikipedia entity: ")
while not validate_entity(entity_start):
    print(f"{entity_start} is not a valid Wikipedia entity. Try again.")
    entity_start = input("Enter the starting Wikipedia entity: ")

entity_end = input("Enter the target Wikipedia entity: ")
while not validate_entity(entity_end):
    print(f"{entity_end} is not a valid Wikipedia entity. Try again.")
    entity_end = input("Enter the target Wikipedia entity: ")

print(f"Valid entities: {entity_start} and {entity_end}")


Enter the starting Wikipedia entity: Anubis
Enter the target Wikipedia entity: Alexander_the_Great
Valid entities: Anubis and Alexander_the_Great


*Shortest Path algorithm between two input entities by Web Scraping*

Through the following functions the website "Six Degrees of Wikipedia" is scraped for retrieving middle entities between the two input ones. For each entity, the title, the description and the URL are stored.

In [None]:
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.common.keys import Keys
import time

def related_entities_triples(start, end):
    options = Options()
    options.add_argument('--headless')
    options.add_argument('--no-sandbox')
    options.add_argument('--disable-dev-shm-usage')

    driver = webdriver.Chrome(options=options)
    driver.get(f"https://www.sixdegreesofwikipedia.com/?source={start}&target={end}")

    # Click the button to generate the shortest path
    try:
        driver.find_element(By.CSS_SELECTOR, "button").click()
        time.sleep(5)  # Allow time for content to load
    except Exception as e:
        print("Error clicking button:", e)
        driver.quit()
        return []

    # Scroll to load the "INDIVIDUAL PATHS" content
    try:
        webtext = driver.find_elements(By.XPATH, "//div[1]/div[2]/div[5]")[0]  # Container for paths content
        for _ in range(5):  # Scroll down several times to ensure content loads
            driver.find_element(By.TAG_NAME, 'body').send_keys(Keys.PAGE_DOWN)
            time.sleep(1)  # Wait briefly for new content to load

        webtexto = webtext.text
    except Exception as e:
        print("Error retrieving 'INDIVIDUAL PATHS' content:", e)
        driver.quit()
        return []

    # Process the extracted text from "INDIVIDUAL PATHS"
    hrefs_list = []
    titles_list = []
    captions_list = []

    # Split webtext by lines to parse titles and captions
    lines = webtexto.split("\n")
    for i in range(0, len(lines), 2):  # Assuming title and caption alternate in lines
        if i < len(lines):
            titles_list.append(lines[i])  # Title on even lines
        if i + 1 < len(lines):
            captions_list.append(lines[i + 1])  # Caption on odd lines

    # Create triples with titles, captions, and hrefs
    triples = []
    for title, caption in zip(titles_list, captions_list):
        href = f"https://en.wikipedia.org/wiki/{title.replace(' ', '_')}"
        triples.append({
            "title": title,
            "caption": caption,
            "href": href
        })

    # Generate triple groups
    triple_groups = []
    for i in range(len(triples) - 2):
        triple_groups.append((triples[i], triples[i+1], triples[i+2]))

    driver.quit()
    return triple_groups

# Example usage
#start_entity = "Anubis"
#end_entity = "Tale of Two Brothers"
entity_triples = related_entities_triples(entity_start, entity_end)
print(entity_triples)


[({'title': 'Anubis', 'caption': 'Egyptian deity of mummification and the afterlife, usually depicted as a man with a canine head', 'href': 'https://en.wikipedia.org/wiki/Anubis'}, {'title': 'Early Dynastic Period (Egypt)', 'caption': 'Period of ancient Egyptian history', 'href': 'https://en.wikipedia.org/wiki/Early_Dynastic_Period_(Egypt)'}, {'title': 'Alexander the Great', 'caption': 'King of Macedonia and conqueror of Achaemenid Persia (356–323 BC)', 'href': 'https://en.wikipedia.org/wiki/Alexander_the_Great'}), ({'title': 'Early Dynastic Period (Egypt)', 'caption': 'Period of ancient Egyptian history', 'href': 'https://en.wikipedia.org/wiki/Early_Dynastic_Period_(Egypt)'}, {'title': 'Alexander the Great', 'caption': 'King of Macedonia and conqueror of Achaemenid Persia (356–323 BC)', 'href': 'https://en.wikipedia.org/wiki/Alexander_the_Great'}, {'title': 'Anubis', 'caption': 'Egyptian deity of mummification and the afterlife, usually depicted as a man with a canine head', 'href': '

# CIDOC-CRM Ontology Harmonization

This procedures connects the titles, descriptions and URLs previously retrieved using the CIDOC-CRM ontology.

Mapping:

P67: refersTo

P102: hasTitle

P104: isSubjectTo

P196: defines

In [None]:
def harmonize_triples_to_crm(triple_groups):
    """
    Harmonizes a list of triple groups into CIDOC-CRM ontology format.

    Parameters:
    - triple_groups: List of tuple groups, where each group contains dictionaries
                     with "title", "caption", and "href" keys.

    Returns:
    - A list of dictionaries in CIDOC-CRM harmonized format.
    """
    harmonized_triples = []

    for group in triple_groups:
        for triple in group:
            title = triple["title"]
            caption = triple["caption"]
            href = triple["href"]

            # Map to CIDOC-CRM relations
            harmonized_triples.extend([
                {"title": title, "cidoc-relation": "P104", "descr": caption},
                {"descr": caption, "cidoc-relation": "P196", "uri": href},
                {"uri": href, "cidoc-relation": "P102", "title": title},
                {"descr": caption, "cidoc-relation": "P196", "title": title},
                {"title": title, "cidoc-relation": "P104", "descr": caption},
                {"uri": href, "cidoc-relation": "P67", "descr": caption},
                {"title": title, "cidoc-relation": "P67", "uri": href}
            ])

        # Add a "prev_title" relation to link the last entity to the next one
        for idx in range(1, len(group)):
            previous_triple = group[idx - 1]
            current_triple = group[idx]
            harmonized_triples.append({
                "prev_title": previous_triple["title"],
                "cidoc-relation": "P67",
                "title": current_triple["title"]
            })

    return harmonized_triples

# Example usage
crm_harmonized_triples = harmonize_triples_to_crm(entity_triples)
print(crm_harmonized_triples)


[{'title': 'Anubis', 'cidoc-relation': 'P104', 'descr': 'Egyptian deity of mummification and the afterlife, usually depicted as a man with a canine head'}, {'descr': 'Egyptian deity of mummification and the afterlife, usually depicted as a man with a canine head', 'cidoc-relation': 'P196', 'uri': 'https://en.wikipedia.org/wiki/Anubis'}, {'uri': 'https://en.wikipedia.org/wiki/Anubis', 'cidoc-relation': 'P102', 'title': 'Anubis'}, {'descr': 'Egyptian deity of mummification and the afterlife, usually depicted as a man with a canine head', 'cidoc-relation': 'P196', 'title': 'Anubis'}, {'title': 'Anubis', 'cidoc-relation': 'P104', 'descr': 'Egyptian deity of mummification and the afterlife, usually depicted as a man with a canine head'}, {'uri': 'https://en.wikipedia.org/wiki/Anubis', 'cidoc-relation': 'P67', 'descr': 'Egyptian deity of mummification and the afterlife, usually depicted as a man with a canine head'}, {'title': 'Anubis', 'cidoc-relation': 'P67', 'uri': 'https://en.wikipedia.o

# DBpedia relationship finder

This function takes in input the entity pairs retrieved through the shortest path algorithm and recursively executes a SPARQL query to find non-trivial DBpedia relationships.
Change the parameter *num_mids* to modify the number of middle relationships between the given entities.


In [None]:
from SPARQLWrapper import SPARQLWrapper, JSON
from urllib.error import HTTPError
import time
import json

def execute_query(query, retries=3, wait=2):
    """Executes a SPARQL query on DBpedia with retry logic."""
    sparql = SPARQLWrapper("http://dbpedia.org/sparql")
    sparql.setQuery(query)
    sparql.setReturnFormat(JSON)

    for attempt in range(retries):
        try:
            results = sparql.query().convert()
            return results["results"]["bindings"]
        except HTTPError as e:
            print(f"HTTPError: {e} - Retrying ({attempt + 1}/{retries})...")
            time.sleep(wait)  # Wait before retrying
        except Exception as e:
            print(f"An unexpected error occurred: {e}")
            break

    print(f"Query failed after {retries} attempts.")
    return []  # Return an empty list if the query fails

def generate_query(entity1, entity2, num_mids=5):
    """Generates a SPARQL query with intermediate nodes and filters."""

    # Define prefixes and initial part of the query
    query = f"""
    PREFIX dbo: <http://dbpedia.org/ontology/>
    PREFIX dbr: <http://dbpedia.org/resource/>
    PREFIX owl: <http://www.w3.org/2002/07/owl#>
    SELECT ?entity1 {" ".join([f"?pf{i} ?mid{i}" for i in range(1, num_mids + 1)])} ?pf{num_mids + 1} ?entity2
    WHERE {{
      VALUES (?entity1 ?entity2) {{ (dbr:{entity1} dbr:{entity2}) }}
      ?entity1 ?pf1 ?mid1 .
    """

    # Loop to add the intermediate relationships based on num_mids
    for i in range(1, num_mids + 1):
        query += f"?mid{i} ?pf{i+1} ?mid{i+1} .\n"

    # Final connection to the target entity
    query += f"?mid{num_mids} ?pf{num_mids + 1} ?entity2 .\n"

    # Filter to ensure distinct nodes in the path
    query += "FILTER(?entity1 != ?mid1 && ?entity2 != ?mid1 "
    for i in range(1, num_mids + 1):
        query += f"&& ?mid{i} != ?mid{i+1} "
    query += "&& ?entity1 != ?entity2) \n"

    # Additional FILTER to exclude unwanted properties
    for i in range(1, num_mids + 2):
        if i != 5:  # Skip filter for certain relationships (if needed)
            query += f"FILTER (?pf{i} NOT IN (dbo:Person, dbo:wikiPageWikiLink, owl:Thing)) \n"

    # Close the query
    query += "} LIMIT 20"

    return query.strip()

def find_relationships_for_entity_pairs(triple_groups):
    """
    Finds DBpedia relationships between each pair of entities in `triple_groups`.

    Parameters:
    - triple_groups: List of triple groups where each entry is a dictionary
                     containing "title" for each entity.

    Returns:
    - A dictionary where each key is an (entity1, entity2) pair, and the value is
      a list of relationships between them.
    """
    relationships = {}
    failed_queries = []  # Track failed queries

    # Collect all unique entity pairs across triple groups
    pairs = set()
    for group in triple_groups:
        titles = [triple["title"].replace(" ", "_") for triple in group]
        pairs.update((titles[i], titles[j]) for i in range(len(titles)) for j in range(i + 1, len(titles)))

    # Execute SPARQL queries for each unique entity pair
    for entity1, entity2 in pairs:
        query = generate_query(entity1, entity2)
        print(f"Finding relationships between {entity1} and {entity2}...")  # Debug output
        results = execute_query(query)

        if results:
            # Parse results to capture relationships
            relationship_data = []
            for result in results:
                relationship_path = []
                for key, value in result.items():
                    relationship_path.append(value["value"])
                relationship_data.append(relationship_path)

            # Store the results in the dictionary
            relationships[(entity1, entity2)] = relationship_data
        else:
            print(f"Failed to retrieve relationships for {entity1} and {entity2}.")
            failed_queries.append((entity1, entity2))

    if failed_queries:
        print("The following queries failed and were retried without success:")
        for entity1, entity2 in failed_queries:
            print(f" - {entity1} to {entity2}")

    return relationships

dbpedia_relationships = find_relationships_for_entity_pairs(entity_triples)



Finding relationships between Egyptian_hieroglyphs and Anubis...
Finding relationships between Greeks and Alexander_the_Great...
Finding relationships between Alexander_the_Great and Osiris...
Finding relationships between Anubis and Early_Dynastic_Period_(Egypt)...
An unexpected error occurred: QueryBadFormed: A bad request has been sent to the endpoint: probably the SPARQL query is badly formed. 

Response:
b"Virtuoso 37000 Error SP030: SPARQL compiler, line 7: syntax error at '(' before 'Egypt'\n\nSPARQL query:\n#output-format:application/sparql-results+json\nPREFIX dbo: <http://dbpedia.org/ontology/>\n    PREFIX dbr: <http://dbpedia.org/resource/>\n    PREFIX owl: <http://www.w3.org/2002/07/owl#>\n    SELECT ?entity1 ?pf1 ?mid1 ?pf2 ?mid2 ?pf3 ?mid3 ?pf4 ?mid4 ?pf5 ?mid5 ?pf6 ?entity2\n    WHERE {\n      VALUES (?entity1 ?entity2) { (dbr:Anubis dbr:Early_Dynastic_Period_(Egypt)) }\n      ?entity1 ?pf1 ?mid1 .\n    ?mid1 ?pf2 ?mid2 .\n?mid2 ?pf3 ?mid3 .\n?mid3 ?pf4 ?mid4 .\n?mid4 ?p

# Saving DBpedia relationships in JSON format

In [None]:
# Save dbpedia_relationships to a .txt file in JSON format

import json

# Convert tuple keys to strings for JSON compatibility
dbpedia_relationships_str_keys = {str(key): value for key, value in dbpedia_relationships.items()}

# Save the modified dictionary to a .txt file in JSON format
with open("dbpedia_relationships.txt", "w", encoding="utf-8") as file:
    json.dump(dbpedia_relationships_str_keys, file, ensure_ascii=False, indent=4)

print("DBpedia relationships saved to dbpedia_relationships.txt")
print(dbpedia_relationships)

DBpedia relationships saved to dbpedia_relationships.txt
{('Egyptian_hieroglyphs', 'Anubis'): [['http://dbpedia.org/resource/Egyptian_hieroglyphs', 'http://dbpedia.org/property/children', 'http://dbpedia.org/resource/Hieratic', 'http://dbpedia.org/property/children', 'http://dbpedia.org/resource/Demotic_(Egyptian)', 'http://dbpedia.org/property/fam', 'http://dbpedia.org/resource/Egyptian_hieroglyphs', 'http://dbpedia.org/property/name', 'http://dbpedia.org/resource/Ptolemaic_dynasty', 'http://dbpedia.org/ontology/wikiPageWikiLink', 'http://dbpedia.org/resource/Hermes', 'http://dbpedia.org/property/equivalent', 'http://dbpedia.org/resource/Anubis'], ['http://dbpedia.org/resource/Egyptian_hieroglyphs', 'http://dbpedia.org/property/children', 'http://dbpedia.org/resource/Proto-Sinaitic_script', 'http://dbpedia.org/property/children', 'http://dbpedia.org/resource/Geʽez_script', 'http://dbpedia.org/property/fam', 'http://dbpedia.org/resource/Egyptian_hieroglyphs', 'http://dbpedia.org/proper

# Saving retrieved DBpedia triples in a CSV file



In [None]:
import json
import csv

# Step 1: Load the JSON data
with open('/content/dbpedia_relationships.txt', 'r') as file:
    dbpedia_relationships = json.load(file)

# Step 2: Collect triples
unique_triples = set()  # Using a set to avoid duplicate triples

# Process each entry in dbpedia_relationships
for paths in dbpedia_relationships.values():
    for path in paths:
        # Build triples along each path
        for i in range(0, len(path) - 2, 2):
            subject = path[i]
            predicate = path[i + 1]
            obj = path[i + 2]
            unique_triples.add((subject, predicate, obj))

# Step 3: Save unique triples to a CSV file
with open("dbpedia_relationships_triples.csv", "w", newline="", encoding="utf-8") as csvfile:
    writer = csv.writer(csvfile)
    writer.writerow(["subject", "predicate", "object"])  # Header

    for triple in unique_triples:
        writer.writerow(triple)

print("Unique triples saved to dbpedia_relationships_triples.csv")


Unique triples saved to dbpedia_relationships_triples.csv


# Saving shortest-path-retrieved triples in a CSV file

In [None]:
import csv

def convert_and_save_to_csv(harmonized_triples, csv_filename="knowledge_graph_triples.csv"):
    """
    Convert the output of harmonize_triples_to_crm and find_relationships_for_entity_pairs
    into CSV format compatible with Kuzu and Kiara.

    Args:
        harmonized_triples (list): Output from harmonize_triples_to_crm.
        dbpedia_relationships (dict): Output from find_relationships_for_entity_pairs.
        csv_filename (str): Name of the CSV file to save.
    """

    # Prepare data for CSV format: (subject, predicate, object)
    csv_data = []

    # Process harmonized triples
    for triple in harmonized_triples:
        # Extract the subject, predicate, and object based on available keys
        if "title" in triple and "descr" in triple:
            csv_data.append((triple["title"], triple["cidoc-relation"], triple["descr"]))
        elif "title" in triple and "uri" in triple:
            csv_data.append((triple["title"], triple["cidoc-relation"], triple["uri"]))
        elif "descr" in triple and "uri" in triple:
            csv_data.append((triple["descr"], triple["cidoc-relation"], triple["uri"]))
        elif "prev_title" in triple and "title" in triple:
            csv_data.append((triple["prev_title"], triple["cidoc-relation"], triple["title"]))
        else:
            print(f"Skipping incomplete data in dictionary format: {triple}")

    # Save to CSV
    with open(csv_filename, mode="w", newline="", encoding="utf-8") as file:
        writer = csv.writer(file)
        writer.writerow(["subject", "predicate", "object"])  # Header
        writer.writerows(csv_data)

    print(f"Data successfully saved to {csv_filename}")


convert_and_save_to_csv(crm_harmonized_triples)


Data successfully saved to knowledge_graph_triples.csv


# Merging the files

In [None]:
import csv

# Define the paths to the CSV files
file1 = "/content/dbpedia_relationships_triples.csv"
file2 = "/content/knowledge_graph_triples.csv"
merged_file = "/content/merged_knowledge_graph_triples.csv"

# Step 1: Collect triples from both files into a set to ensure uniqueness
unique_triples = set()

# Read the first CSV file
with open(file1, mode="r", newline="", encoding="utf-8") as f1:
    reader = csv.reader(f1)
    next(reader)  # Skip header
    for row in reader:
        if len(row) == 3:  # Ensure row has subject, predicate, object
            unique_triples.add(tuple(row))

# Read the second CSV file
with open(file2, mode="r", newline="", encoding="utf-8") as f2:
    reader = csv.reader(f2)
    next(reader)  # Skip header
    for row in reader:
        if len(row) == 3:  # Ensure row has subject, predicate, object
            unique_triples.add(tuple(row))

# Step 2: Write the merged unique triples to a new CSV file
with open(merged_file, mode="w", newline="", encoding="utf-8") as mf:
    writer = csv.writer(mf)
    writer.writerow(["subject", "predicate", "object"])  # Header
    for triple in unique_triples:
        writer.writerow(triple)

print(f"Data successfully merged into {merged_file}")


Data successfully merged into /content/merged_knowledge_graph_triples.csv


Initializing kuzu knowledge base

In [None]:
import kuzu
import csv
import shutil
import os

# Specify the database path
database_path = "knowledge_graph_db"

# Check if the directory exists and delete it (including WAL files)
if os.path.exists(database_path):
    shutil.rmtree(database_path)
    print(f"Database directory '{database_path}' deleted.")

# Re-initialize the Kuzu database
db = kuzu.Database(database_path)
conn = kuzu.Connection(db)

# Step 2: Create the schema for entities and relationships without additional attributes
try:
    # Define node type for entities
    conn.execute("""
    CREATE NODE TABLE Entity (
        uri STRING,
        PRIMARY KEY (uri)
    )
    """)

    # Define a basic relationship type without additional attributes
    conn.execute("""
    CREATE REL TABLE RELATIONSHIP (FROM Entity TO Entity)
    """)

    print("Schema created successfully.")
except Exception as e:
    print(f"Error creating schema: {e}")


Schema created successfully.


Populating the kuzu Knowledge Base with the generated CSV file

In [None]:
# Step 3: Define function to insert nodes and relationships
def insert_triple(conn, subject, predicate, obj):
    try:
        # Insert the subject and object nodes if they don't already exist
        conn.execute(f"MERGE (s:Entity {{uri: '{subject}'}})")
        conn.execute(f"MERGE (o:Entity {{uri: '{obj}'}})")

        # Insert the relationship without storing the predicate directly
        conn.execute(f"""
        MATCH (s:Entity {{uri: '{subject}'}}), (o:Entity {{uri: '{obj}'}})
        MERGE (s)-[:RELATIONSHIP]->(o)
        """)
    except Exception as e:
        print(f"Error inserting triple ({subject}, {predicate}, {obj}): {e}")

# Step 4: Upload data from CSV to Kuzu
csv_file = "/content/merged_knowledge_graph_triples.csv"

with open(csv_file, mode="r", newline="", encoding="utf-8") as file:
    reader = csv.reader(file)
    next(reader)  # Skip header

    for row in reader:
        subject, predicate, obj = row
        insert_triple(conn, subject, predicate, obj)

print("Data successfully uploaded to Kuzu database.")

Error inserting triple (Mythical bull in Egyptian mythology, incarnation of Montou's ka, P67, https://en.wikipedia.org/wiki/Buchis): Parser exception: Invalid input <MERGE (s:Entity {uri: 'Mythical bull in Egyptian mythology, incarnation of Montou's>: expected rule oC_SingleQuery (line: 1, offset: 82)
"MERGE (s:Entity {uri: 'Mythical bull in Egyptian mythology, incarnation of Montou's ka'})"
                                                                                   ^
Error inserting triple (Buchis, P104, Mythical bull in Egyptian mythology, incarnation of Montou's ka): Parser exception: Invalid input <MERGE (o:Entity {uri: 'Mythical bull in Egyptian mythology, incarnation of Montou's>: expected rule oC_SingleQuery (line: 1, offset: 82)
"MERGE (o:Entity {uri: 'Mythical bull in Egyptian mythology, incarnation of Montou's ka'})"
                                                                                   ^
Error inserting triple (Mythical bull in Egyptian mythology, incarnat

# Future Work

The present workflow can be improved by inserting the Digital Heritage LOD and by providing some templates of kuzu queries that the user can use off-the-shelf to explore the knowledge base.