🧠 TriAgent: Automated Biomarker Discovery with Deep Research Grounding for Triage in Acute Care by LLM-Based Multi-Agent Collaboration
TriAgent, a large language model (LLM)-based multi-agent framework that couples automated biomarker discovery with deep research for literature-grounded validation and novelty assessment. TriAgent employs a supervisor research agent to generate research topics and delegate targeted queries to specialized sub-agents for evidence retrieval from various data sources. Findings are synthesized to classify biomarkers as either grounded in existing knowledge or flagged as novel candidates, offering transparent justification and highlighting unexplored pathways in acute care risk stratification. Unlike prior frameworks limited to existing routine clinical biomarkers, TriAgent aims to deliver an end-to-end framework from data analysis to literature grounding to improve transparency, explainability and expand the frontier of potentially actionable clinical biomarkers.
TriAgent orchestrates an end-to-end research and modeling pipeline:
- 🧭 Scoping & Research Brief Generation → creates Brief-1
- 📊 Data Analysis + AutoML → performs EDA, trains models, and exports artifacts (Brief-2)
- 🔬 Deep Research with RAG → performs retrieval-augmented research with supervised orchestration
- 📝 Final Report Generation → synthesizes findings into a report (optional PDF export + quality grading)
It unifies LLMs (Anthropic, OpenAI, AWS Bedrock), local RAG, web search/scraping, and AutoML tools (H2O, scikit-learn) into one cohesive system.
• 🐍 Python 3.10+ (Docker image uses 3.13)
• ☕ Java runtime for H2O AutoML (default-jre-headless is installed in Docker)
• 🔑 API keys (add to .env): ANTHROPIC_API_KEY, OPENAI_API_KEY, TAVILY_API_KEY, LANGCHAIN_API_KEY, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION
Build the image (fresh build)
docker compose build --no-cache triagentRun interactively with mounted local data/
docker compose run --rm triagentImplemented in graph/
Executed by triagent.main
• Clarifies user goal; may pause for input 🧑💻
• Produces Brief-1 with assumptions, open questions, and seed ideas
• Runs EDA and H2O AutoML
• Saves metrics, plots, and model artifacts under data/automl/<run_id>
• Produces Brief-2 summarizing key insights
• Performs supervised, multi-agent RAG orchestration
• Uses tools/rag/rag_tool_runner.py for retrieval
• Aggregates evidence and topic-level findings
• Synthesizes all findings into a structured final report
• Optionally exports to PDF (reports/) and grades quality
runs/_artifacts/
data/automl/
reports/
config.yaml
Central config for model choices and runtime:
• research_model, compression_model, and final_report_model
• Provider IDs (Anthropic / OpenAI / AWS Bedrock)
• Token/temperature settings per stage
Use .env at repo root (mounted to /app/triagent/.env in Docker):
Category Variables
LLMs ANTHROPIC_API_KEY, OPENAI_API_KEY, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION.
Search TAVILY_API_KEY (web retrieval), LANGCHAIN_API_KEY (LangSmith tracing)
RAG & Caching TRIAGENT_RAG_SERVER_MODE, TRIAGENT_CHROMA_SERVER_URL,
TRIAGENT_MAX_CONCURRENT, TRIAGENT_ENABLE_CACHE, TRIAGENT_CACHE_TTL, TRIAGENT_REDIS_URL
📘 LLM setup handled in utils/llm_config.py — initializes clients and enforces token limits.
📘 Model selection for different agents can be set from config.yaml
TriAgent/
├── main.py # CLI entrypoint
├── config.yaml # Model/runtime config
├── graph/ # LangGraph workflow
│ ├── graph.py # Builds and compiles workflow
│ ├── nodes.py # Implements scoping, EDA, RAG, reporting
│ ├── edges.py # Workflow connections
│ └── state.py # Typed states/schemas
├── agents/ # Specialized research & analysis agents
├── tools/
│ ├── eda/ # Exploratory data analysis
│ ├── automl/ # H2O AutoML integration
│ └── rag/ # RAG server + Chroma vector DB tools
├── utils/ # LLM setup, logging, paths
├── Dockerfile
├── docker-compose.yaml
├── requirements.txt
├── data/ # Input CSVs & vector DBs
└── reports/ # Exported reports
✅ Graph-native orchestration (interrupts, checkpoints)
✅ Human-in-the-loop scoping for precision
✅ EDA + AutoML with feature inclusion control
✅ Supervised deep research with integrated RAG
✅ Final report synthesis (PDF + grading)
✅ Multi-provider LLM support (Anthropic, OpenAI, Bedrock)
✅ Dockerized runtime — reproducible & portable
💾 Data placement: put CSV file under data/
⚙️ Feature control: input training and target indices to include only desired features
🧠 Model tuning: set models and providers in config.yaml
🔒 Token safety: outputs are clamped to model token limits
📈 Monitoring: reports and JSON summaries saved under runs/_artifacts
Provider Required Keys
Anthropic ANTHROPIC_API_KEY
OpenAI OPENAI_API_KEY
AWS Bedrock AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION
Tavily TAVILY_API_KEY
LangChain LANGCHAIN_API_KEY (+ optional tracing vars)
Use env vars:
TRIAGENT_RAG_SERVER_MODE=true
TRIAGENT_CHROMA_SERVER_URL=http://localhost:8000
TriAgent: Automated Biomarker Discovery with Deep Research Grounding for Triage in Acute Care by LLM-Based Multi-Agent Collaboration
Kerem Delikoyun, Qianyu Chen, Win Sen Kuan, John Tshon Yit Soong, Matthew Edward Cove, Oliver Hayden
https://arxiv.org/abs/2510.16080
@article{delikoyun2025triagent,
title={TriAgent: Automated Biomarker Discovery with Deep Research Grounding for Triage in Acute Care by LLM-Based Multi-Agent Collaboration},
author={Delikoyun, Kerem and Chen, Qianyu and Kuan, Win Sen and Soong, John Tshon Yit and Cove, Matthew Edward and Hayden, Oliver},
journal={arXiv preprint arXiv:2510.16080},
year={2025}
}