🧠 TriAgent: Automated Biomarker Discovery with Deep Research Grounding for Triage in Acute Care by LLM-Based Multi-Agent Collaboration

TriAgent, a large language model (LLM)-based multi-agent framework that couples automated biomarker discovery with deep research for literature-grounded validation and novelty assessment. TriAgent employs a supervisor research agent to generate research topics and delegate targeted queries to specialized sub-agents for evidence retrieval from various data sources. Findings are synthesized to classify biomarkers as either grounded in existing knowledge or flagged as novel candidates, offering transparent justification and highlighting unexplored pathways in acute care risk stratification. Unlike prior frameworks limited to existing routine clinical biomarkers, TriAgent aims to deliver an end-to-end framework from data analysis to literature grounding to improve transparency, explainability and expand the frontier of potentially actionable clinical biomarkers.

🔍 What TriAgent Does

TriAgent orchestrates an end-to-end research and modeling pipeline:

🧭 Scoping & Research Brief Generation → creates Brief-1
📊 Data Analysis + AutoML → performs EDA, trains models, and exports artifacts (Brief-2)
🔬 Deep Research with RAG → performs retrieval-augmented research with supervised orchestration
📝 Final Report Generation → synthesizes findings into a report (optional PDF export + quality grading)

It unifies LLMs (Anthropic, OpenAI, AWS Bedrock), local RAG, web search/scraping, and AutoML tools (H2O, scikit-learn) into one cohesive system.

🧩 Prerequisites

• 🐍 Python 3.10+ (Docker image uses 3.13)
• ☕ Java runtime for H2O AutoML (default-jre-headless is installed in Docker)
• 🔑 API keys (add to .env): ANTHROPIC_API_KEY, OPENAI_API_KEY, TAVILY_API_KEY, LANGCHAIN_API_KEY, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION

🐳 Run with Docker

Build the image (fresh build)

docker compose build --no-cache triagent

Run interactively with mounted local data/

docker compose run --rm triagent

Workflow Overview

Implemented in graph/
Executed by triagent.main

1️⃣ Scoping (scoping node)

• Clarifies user goal; may pause for input 🧑‍💻
• Produces Brief-1 with assumptions, open questions, and seed ideas

2️⃣ Data Analysis (data_analysis node)

• Runs EDA and H2O AutoML
• Saves metrics, plots, and model artifacts under data/automl/<run_id>
• Produces Brief-2 summarizing key insights

3️⃣ Deep Research (deep_research node)

• Performs supervised, multi-agent RAG orchestration
• Uses tools/rag/rag_tool_runner.py for retrieval
• Aggregates evidence and topic-level findings

4️⃣ Report Generation (report_generation node)

• Synthesizes all findings into a structured final report
• Optionally exports to PDF (reports/) and grades quality

🗂️ All artifacts/logs are saved under:

runs/_artifacts/
data/automl/
reports/

⚙️ Configuration

config.yaml

Central config for model choices and runtime:
• research_model, compression_model, and final_report_model
• Provider IDs (Anthropic / OpenAI / AWS Bedrock)
• Token/temperature settings per stage

🌍 Environment Variables

Use .env at repo root (mounted to /app/triagent/.env in Docker):

Category Variables
LLMs ANTHROPIC_API_KEY, OPENAI_API_KEY, AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION.
Search TAVILY_API_KEY (web retrieval), LANGCHAIN_API_KEY (LangSmith tracing)
RAG & Caching TRIAGENT_RAG_SERVER_MODE, TRIAGENT_CHROMA_SERVER_URL, TRIAGENT_MAX_CONCURRENT, TRIAGENT_ENABLE_CACHE, TRIAGENT_CACHE_TTL, TRIAGENT_REDIS_URL

📘 LLM setup handled in utils/llm_config.py — initializes clients and enforces token limits.
📘 Model selection for different agents can be set from config.yaml

🗂️ Project Structure

TriAgent/
├── main.py              # CLI entrypoint
├── config.yaml          # Model/runtime config
├── graph/               # LangGraph workflow
│   ├── graph.py         # Builds and compiles workflow
│   ├── nodes.py         # Implements scoping, EDA, RAG, reporting
│   ├── edges.py         # Workflow connections
│   └── state.py         # Typed states/schemas
├── agents/              # Specialized research & analysis agents
├── tools/
│   ├── eda/             # Exploratory data analysis
│   ├── automl/          # H2O AutoML integration
│   └── rag/             # RAG server + Chroma vector DB tools
├── utils/               # LLM setup, logging, paths
├── Dockerfile
├── docker-compose.yaml
├── requirements.txt
├── data/                # Input CSVs & vector DBs
└── reports/             # Exported reports

🌟 Key Features

✅ Graph-native orchestration (interrupts, checkpoints)
✅ Human-in-the-loop scoping for precision
✅ EDA + AutoML with feature inclusion control
✅ Supervised deep research with integrated RAG
✅ Final report synthesis (PDF + grading)
✅ Multi-provider LLM support (Anthropic, OpenAI, Bedrock)
✅ Dockerized runtime — reproducible & portable

💡 Best Practices

💾 Data placement: put CSV file under data/
⚙️ Feature control: input training and target indices to include only desired features
🧠 Model tuning: set models and providers in config.yaml
🔒 Token safety: outputs are clamped to model token limits
📈 Monitoring: reports and JSON summaries saved under runs/_artifacts

🔑 API Keys

Provider Required Keys
Anthropic ANTHROPIC_API_KEY
OpenAI OPENAI_API_KEY
AWS Bedrock AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_REGION
Tavily TAVILY_API_KEY
LangChain LANGCHAIN_API_KEY (+ optional tracing vars)

🧩 RAG Configuration

Use env vars:

TRIAGENT_RAG_SERVER_MODE=true
TRIAGENT_CHROMA_SERVER_URL=http://localhost:8000

📚 Citation

TriAgent: Automated Biomarker Discovery with Deep Research Grounding for Triage in Acute Care by LLM-Based Multi-Agent Collaboration Kerem Delikoyun, Qianyu Chen, Win Sen Kuan, John Tshon Yit Soong, Matthew Edward Cove, Oliver Hayden
https://arxiv.org/abs/2510.16080

@article{delikoyun2025triagent,
  title={TriAgent: Automated Biomarker Discovery with Deep Research Grounding for Triage in Acute Care by LLM-Based Multi-Agent Collaboration},
  author={Delikoyun, Kerem and Chen, Qianyu and Kuan, Win Sen and Soong, John Tshon Yit and Cove, Matthew Edward and Hayden, Oliver},
  journal={arXiv preprint arXiv:2510.16080},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 TriAgent: Automated Biomarker Discovery with Deep Research Grounding for Triage in Acute Care by LLM-Based Multi-Agent Collaboration

🔍 What TriAgent Does

🧩 Prerequisites

🐳 Run with Docker

Workflow Overview

1️⃣ Scoping (scoping node)

2️⃣ Data Analysis (data_analysis node)

3️⃣ Deep Research (deep_research node)

4️⃣ Report Generation (report_generation node)

🗂️ All artifacts/logs are saved under:

⚙️ Configuration

🌍 Environment Variables

🗂️ Project Structure

🌟 Key Features

💡 Best Practices

🔑 API Keys

🧩 RAG Configuration

📚 Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
agents		agents
core		core
data		data
graph		graph
tools		tools
utils		utils
.DS_Store		.DS_Store
.env_sample		.env_sample
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
config.yaml		config.yaml
docker-compose.yaml		docker-compose.yaml
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 TriAgent: Automated Biomarker Discovery with Deep Research Grounding for Triage in Acute Care by LLM-Based Multi-Agent Collaboration

🔍 What TriAgent Does

🧩 Prerequisites

🐳 Run with Docker

Workflow Overview

1️⃣ Scoping (scoping node)

2️⃣ Data Analysis (data_analysis node)

3️⃣ Deep Research (deep_research node)

4️⃣ Report Generation (report_generation node)

🗂️ All artifacts/logs are saved under:

⚙️ Configuration

🌍 Environment Variables

🗂️ Project Structure

🌟 Key Features

💡 Best Practices

🔑 API Keys

🧩 RAG Configuration

📚 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages