Integrating structured EHR modeling with collaborative LLM reasoning for clinical decision support.
Predicting Health Outcomes from electronic health records (EHRs) is challenging because traditional models rely on structured data and often ignore external medical knowledge. PHO-Agents addresses this by combining longitudinal EHR encoding with a multi-agent LLM system to improve both predictive performance and clinical interpretability.
- EHR-based Model — Structured EHR sequences are encoded to produce feature weights and initial logits.
- Data Agent — Converts EHR model outputs and structured EHRs into natural-language patient summaries.
- Retrieval Agent — Gathers relevant clinical literature and guidelines via retrieval-augmented generation.
- Research & Practical Doctor Agents — Independently assess the patient from different clinical perspectives.
- Leader Agent — Synthesizes all agent analyses into a unified reasoning output.
- Fusion — Logit-level fusion of EHR model outputs and LLM agent outputs produces final predictions and explanation reports.
| Cohort | Task | Outcome Window | Source |
|---|---|---|---|
| Acute Kidney Injury (AKI) | In-hospital mortality | During admission | MIMIC-IV v3.1 |
| Chronic Kidney Disease (CKD) | AKI onset | Within 2 years | UFHealth IDR |
| Immune Checkpoint Inhibitors (ICI) therapy | Immune-related adverse events | Within 1 year | UFHealth IDR |
PHO-Agents outperforms both EHR-only deep learning models and LLM-only (single-agent and multi-agent) approaches, achieving stronger discrimination and better precision-recall balance. Beyond accuracy, it generates transparent reasoning chains that explain predictions using medical literature and clinical guidelines, enhancing clinician trust. Crucially, only the EHR model requires training, keeping inference costs under $0.02 per patient and runtimes around 1 minute, making it practical for real-world clinical deployment.
PHO-Agents/
├── assets/ # Figures and static resources
├── baselines/ # Baseline model implementations
├── corpus/ # External medical knowledge corpus
│ ├── guideline/ # Clinical guidelines
│ └── pubmed/ # PubMed literature
├── ehr_agents/ # Multi-agent system modules
│ ├── prompt_template/ # Prompt templates for each agent
│ ├── agents_framework.py # Agent orchestration and coordination
│ ├── data_agent.py # Data agent (EHR → patient summary)
│ ├── retrieve_agent.py # Retrieval agent (guideline/literature RAG)
│ └── ...
├── ehr_datasets/ # Dataset preprocessing and processing
│ ├── CKD/ # Chronic kidney disease cohort
│ ├── ICI/ # Immune checkpoint inhibitor cohort
│ ├── mimic-iv/ # MIMIC-IV (AKI) cohort
│ └── ...
├── ehr_models/ # EHR-based sequential model
│ ├── configs/ # Model configuration files
│ ├── models/ # Model architectures
│ ├── pipelines/ # Training and inference pipelines
│ ├── train_test.py # Model training and evaluation
│ ├── importance.py # Feature importance extraction
│ └── ...
├── agents_outs.py # Collect and format agent outputs
├── collaboration.py # Multi-agent collaboration pipeline
├── environment.yml # Conda environment specification
└── fusion.ipynb # Logit-level fusion and final results
- Conda (Miniconda or Anaconda)
- Python 3.9
- An LLM API key configured in your environment
conda env create -f environment.yml
conda activate phoagents Navigate to the dataset folder and run the notebooks for your target cohort (aki, ckd, or ici):
cd ehr_datasets
# Open and run in order:
# {dataset_name}.preprocessing.ipynb
# {dataset_name}.processing.ipynbRepeat for each cohort you wish to evaluate.
cd ehr_models
python train_test.py # trains and evaluates the EHR encoder
python importance.py # extracts feature importance scorescd .. # return to PHO-Agents root
python collaboration.py # runs the multi-agent reasoning pipeline
python agents_outs.py # collects and formats agent outputsOpen and run fusion.ipynb to perform logit-level fusion and generate final predictions and explanation reports.
If you find this work useful, please cite this repo.
This project is licensed under the MIT License.
