Skip to content

cngvng/HERMES

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

160 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HERMES: Graph-Based Healthcare Prediction Model using Clinical-Text Only

Last updated: 05/01/2026

Overview

This is the official and only repository for the paper titled "HERMES: Graph-Based Healthcare Prediction Model using Clinical-Text Only", which has been submitted to CITA 26' and is currently under review. If you have any trouble reproducing the results, please reach out to me using the personal email on my GitHub profile.

Abstract

Quick Start

Prerequisites

  1. MIMIC Dataset: Obtain access to MIMIC-III or MIMIC-IV from PhysioNet
  2. PrimeKG: Download the PrimeKG knowledge graph
  3. Environment: Python 3.10+, PyTorch, PyTorch Lightning (see requirements.txt)
  4. Place raw data in dataset/raw/ directory

MIMIC-III Pipeline

# Step 1: Preprocess raw MIMIC-III data (EHR, notes, ICD codes)
bash scripts/mimic-iii-preprocess.sh

# Step 2: Run full pipeline (graph extraction, embeddings, training, evaluation)
bash scripts/mimic-iii-full-pipeline.sh

MIMIC-IV Pipeline

# Step 1: Preprocess raw MIMIC-IV data
bash scripts/mimic-iv-preprocess.sh

# Step 2: Run full pipeline
bash scripts/mimic-iv-full-pipeline.sh

Note: Each script contains multiple steps that can be run individually. Review and uncomment the desired steps before execution.

Project's Directory Tree

HERMES-EHR/
├── README.md
├── config/                                 # Configuration files
│   ├── experiment_config.yaml              # Training & experiment hyperparameters
│   ├── mimic_iii_config.yaml               # MIMIC-III dataset paths & settings
│   └── mimic_iv_config.yaml                # MIMIC-IV dataset paths & settings
├── dataset/                                # Data folder (not in repo)
│   ├── raw/                                # Raw data (MIMIC, PrimeKG,...)
│   ├── intermediate/                       # Temporary data (processed MIMIC data, splits,...)
│   └── processed/                          # Final training data & processed EHR
├── logs/                                   # Training logs (not in repo)
├── results/                                # Experiment results & metrics
├── papers/                                 # Related scientific papers
│   └── threats/                            # Papers that challenges our research
├── scripts/                                # Bash scripts for pipeline execution
│   ├── mimic-iii-preprocess.sh             # MIMIC-III raw data preprocessing
│   ├── mimic-iii-full-pipeline.sh          # MIMIC-III complete pipeline
│   ├── mimic-iv-preprocess.sh              # MIMIC-IV raw data preprocessing
│   ├── mimic-iv-full-pipeline.sh           # MIMIC-IV complete pipeline
│   └── test.sh                             # Custom script for dev test
└── src/                                    # Main source code
    ├── data/                               # Data processing modules
    │   ├── preprocessing.py                # General data preprocessing utilities
    │   ├── note_graphs.py                  # Clinical notes → knowledge graphs
    │   ├── graph_embedding.py              # Graph embeddings with BGE-M3
    │   ├── create_training.py              # Create final HDF5 training files
    │   └── training_data_split.py          # Train/val/test splitting
    ├── evaluation/                         # Evaluation & metrics
    │   └── evaluation_toolkit.py           # Bootstrap metrics, plots, AUROC/AUPRC
    ├── experiment/                         # Experiment orchestration
    │   └── run_experiment.py               # Main experiment loop & grid search
    ├── KGSum/                              # Knowledge Graph Summarization (LLM-based)
    │   ├── entity_extractor.py             # Extract entities from clinical notes
    │   ├── relation_extractor.py           # Extract relations between entities
    │   ├── kgsum_agent.py                  # Main KGSum orchestration agent
    │   └── prompts.py                      # LLM prompts for KG extraction
    ├── language_models/                    # Language model wrappers
    │   ├── bgem3.py                        # BGE-M3 embedding model
    │   ├── clinical_longformer.py          # Clinical Longformer encoder
    │   ├── call_llm_mistral.py             # Mistral API wrapper
    ├── mimic-preprocessing/                # MIMIC dataset preprocessing
    ├── models/                             # Neural network architectures
    ├── training/                           # Training components
    │   ├── data_loader.py                  # PyTorch Lightning DataModule
    │   ├── emerge.py                       # EMERGE multimodal fusion model
    │   ├── ehr_encoder.py                  # EHR time-series encoder (Raindrop)
    │   ├── graph_encoder.py                # GNN encoder (GCN/GAT/RGCN)
    │   ├── text_fusion.py                  # Text modality fusion layers
    └── utils/                              # Utility functions
        ├── files_loader.py                 # File I/O (YAML, CSV, JSON, H5)
        ├── logging.py                      # Logging configuration
        └── cleanup.py                      # Resource cleanup utilities

About

HERMES-EHR: Hierarchical Knowledge-Graph–Guided Multi-Retrieve and Agentic Fusion for Multimodal Electronic Health Records

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors