- Allen Baron (University of Maryland - Baltimore)
- Yibei Chen (MIT)
- Anne Ketter (Computercraft)
- Samarpan Mohanty (University of Nebraska - Lincoln)
- Evan Molinelli (Chan Zuckerburg Initiative)
- Van Truong (University of Pennsylvania)
Biomedical knowledge graphs (KGs) are powerful tools for linking genes, diseases, and phenotypes — but when AI models generate new edges, they often hallucinate or introduce errors. Our project focuses on pruning these errors. We show how combining human review, grounded AI, and graph learning can work together to keep biomedical knowledge graphs accurate and trustworthy.
The KG Model Garbage Collection Tool is a proof-of-concept that:
- Starts with a trusted subset of the Monarch KG (with edges randomly removed).
- Fills in missing edges using three approaches to simulate a real-world, messy knowledge graph: random assignment (control), a general LLM, and an LLM with biomedical RAG.
- Participants, for example subject-matter experts (SMEs), are invited to click and validate (some of) these edges through a simple interface to evaluate how close each method comes to the truth.
- This data is used to train a graph neural network (GNN) to see if it can automatically spot questionable edges and flag them for review and removal.
- The resulting knowledge graph is tested against the original, trusted knowledge graph.
Large Language Models (LLMs) are increasingly used to scale up knowledge graphs, but they introduce errors nad hallucinations. In biomedicine, these mistakes can have real-world consequences. While Human-in-the-Loop (HITL) approaches can mitigate risks, they are not scalable solutions for large, complex knowledge graphs.
The KG Model Garbage Collection Tool provides a proof-of-concept (PoC) framework allowing curators to probe a KG using real scientific questions, provide feedback, and use that feedback to train a GNN. This tool extends the impact of human curation by learning from expert human validation patterns.

- You can collaboratively find and remove problem edges.
- Isolate part of a large knowledge graph to curate a smaller, workable data set
- Teach a GNN to find problems and curate only the problems it identifies
- Check the problems manually - iterate until agreement on what to prune
- Python 3.8 or higher
- Node.js 14.x or higher (for frontend components)
- AWS CLI configured with appropriate credentials
- Access to PubMed E-utilities (for RAG functionality)
Key python dependencies include: pandas, numpy, boto3 (AWS SDK), sentence-transformers, chromadb, langchain, requests
Configure AWS credentials for Bedrock access:
aws configure
Ensure access to the following AWS services:
- Amazon Bedrock (for LLM inference)
- Appropriate IAM permissions for model access
- Clone the repository:
git clone https://github.com/collaborativebioinformatics/Model_Garbage_Collection.git
cd Model_Garbage_Collection
- Install dependencies:
pip install .
- Configure environment variables:
export AWS_REGION=your-region
export AWS_PROFILE=your-profile
Here's an overview of our filetree in this repo.
Model_Garbage_Collection/
├── app/
│ └── frontend/ # React frontend application
├── data/ # Input datasets
│ ├── alzheimers_nodes.json
│ └── alzheimers_triples.csv
├── notebooks/ # Jupyter notebooks for analysis
│ └── model_testing.ipynb
├── outputs/ # Generated results and datasets
│ └── cytoscape/ # Graph visualization files
├── src/ # Core source code
│ ├── gnn/ # Graph Neural Network components
│ │ ├── lcilp/ # Link prediction implementation
│ │ │ ├── data/ # Training datasets
│ │ │ ├── ensembling/ # Model ensemble methods
│ │ │ │ ├── blend.py
│ │ │ │ ├── compute_auc.py
│ │ │ │ └── score_triplets_kge.py
│ │ │ ├── kge/ # Knowledge graph embeddings
│ │ │ │ ├── dataloader.py
│ │ │ │ ├── model.py
│ │ │ │ └── run.py
│ │ │ ├── managers/ # Training and evaluation
│ │ │ │ ├── evaluator.py
│ │ │ │ └── trainer.py
│ │ │ ├── model/ # Neural network architectures
│ │ │ │ └── dgl/
│ │ │ │ ├── aggregators.py
│ │ │ │ ├── graph_classifier.py
│ │ │ │ ├── layers.py
│ │ │ │ └── rgcn_model.py
│ │ │ ├── subgraph_extraction/ # Graph sampling
│ │ │ │ ├── datasets.py
│ │ │ │ ├── graph_sampler.py
│ │ │ │ └── multicom.py
│ │ │ ├── utils/ # Utility functions
│ │ │ ├── graph_sampler.py
│ │ │ ├── score_edges.py
│ │ │ ├── train.py
│ │ │ └── test_*.py
│ │ ├── extract.py
│ │ ├── model.py
│ │ └── README_HITL.md
│ ├── knowledge-graph/ # Knowledge graph processing
│ │ ├── create_cytoscape_files.py
│ │ ├── download_nodes.py
│ │ ├── download.py
│ │ ├── extract.py
│ │ ├── synthetic_llm.py
│ │ ├── synthetic_random.py
│ │ └── triples_to_csv.py
│ └── ux/ # User experience components
│ ├── chat.py
│ └── select_edges_for_review.py
├── Edge_Assignor.ipynb # Main RAG pipeline notebook
├── main.py # Main application entry point
├── logo.svg
├── pyproject.toml # Python project configuration
└── README.md # Project documentation
- Knowledge Graph Extraction: Download subgraphs from the Monarch Knowledge Graph, including node metadata (identifiers, labels, descriptions) - src/knowledge-graph/download.py
- Data Preprocessing: Convert graph triples from JSON to structured CSV format for analysis - src/knowledge-graph/triples_to_csv.py
- Edge Removal & Assignment Methodologies: Systematically remove a percentage of edges from trusted graph data to create incomplete subgraphs. We used three strategies for creating our test KGs. - Edge_Assignor.ipynb. See README-Edge_Assignor.md for details on RAG pipeline.
- Validation Framework: Compare predicted edges against ground truth using exact matching and validation scoring
- Graph Neural Network Training: Extract graph backbones for GNN input and training on validation patterns
- Extract Graph Backbone:
python src/knowledge-graph/extract.py
- Prepare Training Data (see src/gnn/README_HITL.md for details):
python src/gnn/run_hitl_prep.sh
- Train GNN Model (see src/gnn/lcilp/README.md for details):
python src/gnn/lcilp/train.py
Simulated Human Curation: A Python script that simulates human review by comparing assigned edges against ground truth, generating curated datasets for GNN training - src/human_simulator.py
- Ground Truth Comparison: Systematic comparison against trusted Monarch KG data
- Accuracy Metrics: Predicate matching rates, precision, and recall calculations
- Error Analysis: Categorization of prediction errors and failure modes
- Human Validation Interface: Prototype of an interactive web browser tool to collect expert review and feedback
Prepare cytoscape visualization files: Create files for visualization in Cytoscape with node & edge data for each rebuilt knowledge graph & associated backbones - src/knowledge-graph/create_cytoscape_files.py
The GUI we built is a simulated example.
We built this prototype over 3 days as a hackathon team. We're stoked about it and are considering extending it in the future. We welcome any contributors or folks who wants to continue building off our proof-of-concept.
We welcome contributions from the biomedical informatics and AI research communities. Please submit feedback and requests as 'issues'!
See LICENSE file.
The KG Model Garbage Collection tool uses and displays data and algorithms from the Monarch Initiative. The Monarch Initiative (https://monarchinitiative.org) makes biomedical knowledge exploration more efficient and effective by providing tools for genotype-phenotype analysis, genomic diagnostics, and precision medicine across broad areas of disease. We acknowledge the contributions of domain experts and the broader biomedical informatics community.