Find Conceptual Demo hosted via streamlit here: https://kg-rag-demo-8vjfpvifgemxgmvtocmly3.streamlit.app/
(The demo uses mock data)
Data folders in this repository hold the following information:
data/logsholds log database"log.db"of live runs made in"streamlit_app.py"data/patient_notesholds data for testing withscripts/batch_testing.py.
Scripts folder contains:
analyze_batch_outputs.pyinspection of test databatch_testing.pymain pipeline script, setup for running full datasetsextraction.pyextraction of relationship triplets from scrape filesjudge_utils.pyjudge configuration scriptscrape.pyscript used to scrape relevant files from sundhed.dkvector_search_transformer.pyscript for creating vector embeddings of diagnosis nodes
The root folder holds:
requirements.txtneeded to run for installation of packages.streamlit_app.pythe live streamlit version of pipeline.- and other standard github elements.
We reached out to relevant actors to get a permission agreement. Get relevant permissions if needed.
scripts/scrape.pyto scrape files.scripts/extraction.pyto extract relationship triplets.scripts/vector_search_transformerto create vector embeddings. This uses the modelsentence-transformers/paraphrase-multilingual-MiniLM-L12-v2.
scripts/judge_utils.pyholds judge configurationscripts/batch_testing.pyruns xlsx data located indata/patient_notesfile through the entire pipeline.scripts/analyze_batch_outputs.pyallows inspection of run files from batch testing.- The model used throughout is
gemma3:27b
streamlit_app.pylive version of the application. Uses only the Weighted-KG variant unless changed.
BASE_URL1 =
BASE_URL2 =
BASE_URL3 =
ENDPOINT =
NEO4J_URI=
NEO4J_USERNAME=
NEO4J_PASSWORD=
NEO4J_DATABASE=If not applicable to your case, change scripts accordingly.
scripts/batch_testing.py
This script runs a batch pipeline over Excel files containing patient notes and expected diagnoses. It extracts clinical retrieval terms, retrieves diagnosis candidates from vector indexes, expands related knowledge graph relations through Neo4j, reranks candidates with a local LLM, generates diagnosis outputs, and optionally evaluates the results with judge functions.
For each row in the input Excel files, the pipeline:
- Reads a patient note from the
pso_notecolumn. - Reads the expected diagnosis from the
diagnosiscolumn. - Uses a local LLM to extract structured Danish clinical retrieval terms.
- Searches diagnosis-like nodes using hybrid dense + lexical retrieval.
- Reranks diagnosis candidates with the local LLM.
- Expands selected diagnosis nodes into nearby Neo4j knowledge graph relations.
- Reranks retrieved KG relations with the local LLM.
- Builds one or more diagnostic prompts:
- with KG context
- without KG context
- with KG context weighted as main evidence
- Generates diagnostic answers.
- Optionally runs judge evaluations.
- Saves one JSON output per processed row.
- Saves a manifest file summarizing the batch run.
The script assumes the following project structure:
project-root/
├── .env
├── scripts/
│ └── your_script.py
├── data/
│ ├── patient_notes/
│ │ └── input_file.xlsx
│ ├── batch_outputs/
│ ├── vector_search/
│ │ ├── relationship_index.json
│ │ ├── diagnosis_index.json
│ │ ├── relationship_lexical.json
│ │ ├── diagnosis_lexical.json
│ │ ├── relationship_embeddings.npy
│ │ └── diagnosis_embeddings.npy
│ └── judge_debug/
└── judge_utils.py