This code utilises the design_kgex package provided here : https://github.com/siddharthl93/engineering-design-knowledge/tree/main
Please go through the example-usage.ipynb file in the repo in the link first.

The below code utilises the model trained by Siddharth, L., Luo, J., 2024. Retrieval-Augmented Generation using Engineering Design Knowledge. (cs.CL)
https://arxiv.org/abs/2307.06985

The use case is HVAC patents' Abstract texts.
The scientific triplets thereby extracted are converted into a knowledge graph which is queried using graphRAG. The end output is a chatbot that delivers context informed answers to queries. 

__Package Installation__
Please install the following packages in the desired Python environment.

In [None]:
from IPython.display import clear_output

!pip install spacy[transformers]
!pip install spacy torch patoolib bs4
!python -m spacy download en_core_web_sm
!python -m spacy download en_core_web_trf

clear_output()

^C


ERROR: Could not find a version that satisfies the requirement patoolib (from versions: none)
ERROR: No matching distribution found for patoolib


^C
Collecting en-core-web-sm==3.8.0
  Downloading https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.8.0/en_core_web_sm-3.8.0-py3-none-any.whl (12.8 MB)
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
     ---------------------------------------- 0.0/12.8 MB ? eta -:--:--
      --------------------------------------- 0.3/12.8 MB ? eta -:--:--
     - ------------------------------------- 0.5/12.8 MB 699.0 kB/s eta 0:00:18
     - ------------------------------------- 0.5/12.8 MB 699.0 kB/s eta 0:00:18
     -- ------------------------------------ 0.8/12.8 MB 644.9 kB/s eta 0:00:19
     -- ------------------------------------ 0.8/12.8 MB 644.9 kB/s eta 0:00:19
     --- ----------------------------------- 1.0/12.8 MB 636.8 kB/s eta 0:00:19
     --- ----------------------------------- 1.0/12.8 MB 636.8 kB/s eta 0:00:19
     --- ----------------------------------- 1.3/12.8 MB 627.1 kB/s eta 0:00:19
     --- ----------------------------------- 

__Module Import__
The package modules shall be imported as follows. The underlying trained transformer models will be downloaded during first time import.
Please ensure that the package folder "design_kgex" is placed in the sample directory as the current working directory.
The package can be downloaded from GitHub - *https://github.com/siddharthl93/engineering-design-knowledge*

In [None]:
import sys
import os
os.chdir(r"C:\Users\Admin\Desktop\engineering-design-knowledge-main")
print(os.getcwd())  # Confirm it changed

# Add the parent directory that contains design_kgex
sys.path.append(os.path.abspath("."))  # or give the full path if needed

# Now you can import using the package structure
from design_kgex import design_knowledge


C:\Users\Admin\Desktop\engineering-design-knowledge-main


The file with the raw text is provided in the repo.

In [None]:
import random

import pandas as pd
import random
import spacy

nlp = spacy.load("en_core_web_sm")  # or whatever model you're using

df = pd.read_csv(r"C:\Users\Admin\Desktop\hvac_kg_project\justia_patents.csv")  # Your file with "Abstract" column

sentences = [sent.text.strip() for abstract in df["Abstract"].dropna() 
             for sent in nlp(abstract).sents if 15 <= len(sent.text.split()) <= 100]






The triplets are extracted as below.

In [None]:
import pandas as pd

example_sentences = sentences
extracted_knowledge = design_knowledge.extractDesignKnowledge(example_sentences)

# Convert the list of dictionaries to a DataFrame
df = pd.DataFrame(extracted_knowledge)

# Save to CSV
output_path = r"C:\Users\Admin\Desktop\extracted_design_knowledge.csv"
df.to_csv(output_path, index=False)

print(f"✅ Extracted knowledge saved to: {output_path}")


Sorry, no GPU is available! Processing will be performed in normal time.


100%|██████████| 2184/2184 [2:25:52<00:00,  4.01s/it]  

✅ Extracted knowledge saved to: C:\Users\Admin\Desktop\extracted_design_knowledge.csv





To run the below code on your system, install Neo4j, load the knowledge graph(2) I have provided as csv, install and run ollama mistral (by typing ollama run mistral) in the cmd prompt. And after that, run this code. 

*Loading the knowledge graph will require more steps. Enter the password for YOUR dbms in place of "password" below.

In [None]:
import pandas as pd
import ast
from py2neo import Graph, Node, Relationship

# Connect to Neo4j
graph = Graph("bolt://localhost:7687", auth=("neo4j", "password"))

# Read CSV
df = pd.read_csv(r"C:\Users\Admin\.Neo4jDesktop\relate-data\dbmss\dbms-b00ba9a5-f609-46b2-b60e-070ada63a936\import\extracted_design_knowledge.csv")

# Iterate through rows
for index, row in df.iterrows():
    try:
        facts = ast.literal_eval(row['facts'])  # Convert string list to actual list
        for triplet in facts:
            if len(triplet) == 3:
                head, relation, tail = triplet

                # Create or merge nodes and relationships
                graph.run("""
                    MERGE (h:Entity {name: $head})
                    MERGE (t:Entity {name: $tail})
                    MERGE (h)-[r:RELATION {type: $relation}]->(t)
                """, head=head.strip(), relation=relation.strip(), tail=tail.strip())
    except Exception as e:
        print(f"Error on row {index}: {e}")


In [None]:
import gradio as gr
from py2neo import Graph
from langchain_community.llms import Ollama
from langchain.prompts import PromptTemplate
import re

# ---- SETUP ----

graph = Graph("bolt://localhost:7687", auth=("neo4j", "knowledge"))

prompt_template = PromptTemplate(
    input_variables=["context", "user_query"],
    template="""
You are a helpful assistant. Use the context below to answer the question.
If the context does not contain enough information, say "I don't know based on the current knowledge."

Context:
{context}

Question: {user_query}

Answer:
"""
)

def extract_keywords(query):
    query = query.lower()
    query = re.sub(r"[^\w\s]", "", query)
    keywords = [word for word in query.split() if len(word) > 2]
    return keywords

def search_graph(graph, keywords, limit=10):
    if not keywords:
        return []

    # Prepare the scoring logic for each keyword
    score_parts = []
    where_parts = []
    for kw in keywords:
        kw_lower = kw.lower()
        score_parts.extend([
            f"CASE WHEN toLower(n.name) CONTAINS '{kw_lower}' THEN 1 ELSE 0 END",
            f"CASE WHEN toLower(m.name) CONTAINS '{kw_lower}' THEN 1 ELSE 0 END",
            f"CASE WHEN toLower(r.type) CONTAINS '{kw_lower}' THEN 1 ELSE 0 END"
        ])
        where_parts.append(
            f"toLower(n.name) CONTAINS '{kw_lower}' OR toLower(m.name) CONTAINS '{kw_lower}' OR toLower(r.type) CONTAINS '{kw_lower}'"
        )

    score_clause = " + ".join(score_parts)
    where_clause = " OR ".join(where_parts)

    query = f"""
    MATCH (n)-[r]->(m)
    WHERE {where_clause}
    WITH n, r, m,
        ({score_clause}) AS score
    WHERE score > 0
    RETURN n.name AS Head, r.type AS Relation, m.name AS Tail, score
    ORDER BY score DESC
    LIMIT {limit}
    """

    print("[DEBUG] Cypher Query:\n", query)

    result = graph.run(query)
    return [(record["Head"], record["Relation"], record["Tail"]) for record in result]



def answer_question(user_query):
    keywords = extract_keywords(user_query)
    context_tuples = search_graph(graph, keywords)

    if not context_tuples:
        return "🔍 No matching context found in the knowledge graph."

    # Convert list of (head, relation, tail) into string format
    context = "\n".join([f"{h} --{r}--> {t}" for h, r, t in context_tuples])

    # Create the prompt
    prompt = prompt_template.format(context=context, user_query=user_query)

    # LLM call
    llm = Ollama(model="mistral")
    response = llm.invoke(prompt)

    return f"🤖 Answer: {response}\n\n📚 Context Used:\n{context}"


# ---- GRADIO UI ----

demo = gr.Interface(
    fn=answer_question,
    inputs=gr.Textbox(label="Ask a question about your knowledge graph"),
    outputs=gr.Textbox(label="Answer with context"),
    title="Neo4j + Ollama RAG Chatbot",
    description="Ask any question. The bot searches your Neo4j knowledge graph for relevant facts, then generates an answer using a local CPU-friendly LLM.",
)

demo.launch()


  from .autonotebook import tqdm as notebook_tqdm


* Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.




[DEBUG] Cypher Query:
 
    MATCH (n)-[r]->(m)
    WHERE toLower(n.name) CONTAINS 'what' OR toLower(m.name) CONTAINS 'what' OR toLower(r.type) CONTAINS 'what' OR toLower(n.name) CONTAINS 'the' OR toLower(m.name) CONTAINS 'the' OR toLower(r.type) CONTAINS 'the' OR toLower(n.name) CONTAINS 'role' OR toLower(m.name) CONTAINS 'role' OR toLower(r.type) CONTAINS 'role' OR toLower(n.name) CONTAINS 'zone' OR toLower(m.name) CONTAINS 'zone' OR toLower(r.type) CONTAINS 'zone' OR toLower(n.name) CONTAINS 'control' OR toLower(m.name) CONTAINS 'control' OR toLower(r.type) CONTAINS 'control' OR toLower(n.name) CONTAINS 'unit' OR toLower(m.name) CONTAINS 'unit' OR toLower(r.type) CONTAINS 'unit' OR toLower(n.name) CONTAINS 'hvac' OR toLower(m.name) CONTAINS 'hvac' OR toLower(r.type) CONTAINS 'hvac' OR toLower(n.name) CONTAINS 'system' OR toLower(m.name) CONTAINS 'system' OR toLower(r.type) CONTAINS 'system'
    WITH n, r, m,
        (CASE WHEN toLower(n.name) CONTAINS 'what' THEN 1 ELSE 0 END + CASE 

  llm = Ollama(model="mistral")


Using existing dataset file at: .gradio\flagged\dataset1.csv
[DEBUG] Cypher Query:
 
    MATCH (n)-[r]->(m)
    WHERE toLower(n.name) CONTAINS 'how' OR toLower(m.name) CONTAINS 'how' OR toLower(r.type) CONTAINS 'how' OR toLower(n.name) CONTAINS 'does' OR toLower(m.name) CONTAINS 'does' OR toLower(r.type) CONTAINS 'does' OR toLower(n.name) CONTAINS 'return' OR toLower(m.name) CONTAINS 'return' OR toLower(r.type) CONTAINS 'return' OR toLower(n.name) CONTAINS 'air' OR toLower(m.name) CONTAINS 'air' OR toLower(r.type) CONTAINS 'air' OR toLower(n.name) CONTAINS 'section' OR toLower(m.name) CONTAINS 'section' OR toLower(r.type) CONTAINS 'section' OR toLower(n.name) CONTAINS 'interact' OR toLower(m.name) CONTAINS 'interact' OR toLower(r.type) CONTAINS 'interact' OR toLower(n.name) CONTAINS 'with' OR toLower(m.name) CONTAINS 'with' OR toLower(r.type) CONTAINS 'with' OR toLower(n.name) CONTAINS 'outside' OR toLower(m.name) CONTAINS 'outside' OR toLower(r.type) CONTAINS 'outside' OR toLower(n.na