Demo

Find Conceptual Demo hosted via streamlit here: https://kg-rag-demo-8vjfpvifgemxgmvtocmly3.streamlit.app/

(The demo uses mock data)

Repository Information

Data folders in this repository hold the following information:

data/logs holds log database "log.db" of live runs made in "streamlit_app.py"
data/patient_notes holds data for testing with scripts/batch_testing.py.

Scripts folder contains:

analyze_batch_outputs.py inspection of test data
batch_testing.py main pipeline script, setup for running full datasets
extraction.py extraction of relationship triplets from scrape files
judge_utils.py judge configuration script
scrape.py script used to scrape relevant files from sundhed.dk
vector_search_transformer.py script for creating vector embeddings of diagnosis nodes

The root folder holds:

requirements.txt needed to run for installation of packages.
streamlit_app.py the live streamlit version of pipeline.
and other standard github elements.

Reproduce our results with the following setup:

We reached out to relevant actors to get a permission agreement. Get relevant permissions if needed.

Get Data Foundation:

scripts/scrape.py to scrape files.
scripts/extraction.py to extract relationship triplets.
scripts/vector_search_transformer to create vector embeddings. This uses the model sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2.

Pipeline Setup:

scripts/judge_utils.py holds judge configuration
scripts/batch_testing.py runs xlsx data located in data/patient_notes file through the entire pipeline.
scripts/analyze_batch_outputs.py allows inspection of run files from batch testing.
The model used throughout is gemma3:27b

Live Testing:

streamlit_app.py live version of the application. Uses only the Weighted-KG variant unless changed.

Setup `.env` file with the following for direct application:

BASE_URL1 =
BASE_URL2 =
BASE_URL3 =
ENDPOINT =
NEO4J_URI=
NEO4J_USERNAME=
NEO4J_PASSWORD=
NEO4J_DATABASE=

If not applicable to your case, change scripts accordingly.

Overview of main pipeline:

scripts/batch_testing.py

Batch Clinical KG Diagnosis Pipeline

This script runs a batch pipeline over Excel files containing patient notes and expected diagnoses. It extracts clinical retrieval terms, retrieves diagnosis candidates from vector indexes, expands related knowledge graph relations through Neo4j, reranks candidates with a local LLM, generates diagnosis outputs, and optionally evaluates the results with judge functions.

What the script does

For each row in the input Excel files, the pipeline:

Reads a patient note from the pso_note column.
Reads the expected diagnosis from the diagnosis column.
Uses a local LLM to extract structured Danish clinical retrieval terms.
Searches diagnosis-like nodes using hybrid dense + lexical retrieval.
Reranks diagnosis candidates with the local LLM.
Expands selected diagnosis nodes into nearby Neo4j knowledge graph relations.
Reranks retrieved KG relations with the local LLM.
Builds one or more diagnostic prompts:
- with KG context
- without KG context
- with KG context weighted as main evidence
Generates diagnostic answers.
Optionally runs judge evaluations.
Saves one JSON output per processed row.
Saves a manifest file summarizing the batch run.

Expected project structure

The script assumes the following project structure:

project-root/
├── .env
├── scripts/
│   └── your_script.py
├── data/
│   ├── patient_notes/
│   │   └── input_file.xlsx
│   ├── batch_outputs/
│   ├── vector_search/
│   │   ├── relationship_index.json
│   │   ├── diagnosis_index.json
│   │   ├── relationship_lexical.json
│   │   ├── diagnosis_lexical.json
│   │   ├── relationship_embeddings.npy
│   │   └── diagnosis_embeddings.npy
│   └── judge_debug/
└── judge_utils.py

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
streamlit_app.py		streamlit_app.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Demo

Repository Information

Reproduce our results with the following setup:

Get Data Foundation:

Pipeline Setup:

Live Testing:

Setup `.env` file with the following for direct application:

Overview of main pipeline:

Batch Clinical KG Diagnosis Pipeline

What the script does

Expected project structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Demo

Repository Information

Reproduce our results with the following setup:

Get Data Foundation:

Pipeline Setup:

Live Testing:

Setup .env file with the following for direct application:

Overview of main pipeline:

Batch Clinical KG Diagnosis Pipeline

What the script does

Expected project structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Setup `.env` file with the following for direct application:

Packages