BioCortex: A Next-Generation AI Agent for All Biological Systems

BioCortex is a multi-agent biological AI framework designed to plan, execute, and synthesize complex research workflows across all biological domains (genomics, ecology, neuroscience, agriculture, marine biology, and more).

It extends Biomni-style tool execution with:

3-Stage Hybrid Retrieval ⚡ (vector + knowledge graph + LLM rerank) — 3-5× faster than Biomni's LLM-only retrieval
Strategy routing (ReAct / DAG Parallel / MCTS) — auto-selects optimal execution strategy
Multi-agent pipeline (Planner → Executor → Critic → Synthesizer) — parallel execution & self-validation
Persistent memory & knowledge graph — learns from past analyses
Multimodal support (sequence / structure / image encoders) — ESM-2, DNABERT-2, BiomedCLIP
Biomni tool bridge (200+ tools can be reused) — full backward compatibility

Project Website

Coming soon. For now, launch the local Web UI:

http://localhost:7860

Quick Start

1) Installation

Option A: One-Click Install (Recommended)

cd /path/to/BioCortex
bash install.sh            # Full install (conda + pip)
bash install.sh --minimal  # Core only (skip torch/multimodal)

Option B: Manual Install (Conda)

# Create and activate conda environment
conda create -n biocortex_env python=3.11 -y
conda activate biocortex_env

# Install dependencies
pip install -r requirements.txt

# Install BioCortex (editable)
pip install -e .

Option C: From biocortex_env （recommend）

cd /path/to/BioCortex/biocortex_env
conda env create -f environment.yml
conda activate biocortex
Rscript install_r_packages.R r_packages.yml

2) Configure API Keys

cp env_example.txt .env

Edit .env and set your LLM provider:

# ── Option 1: Anthropic (Claude) ──
ANTHROPIC_API_KEY=sk-ant-xxx

# ── Option 2: OpenAI (GPT-4o) ──
OPENAI_API_KEY=sk-xxx

# ── Option 3: DashScope / Qwen (recommended for China) ──
BIOCORTEX_CUSTOM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
BIOCORTEX_CUSTOM_API_KEY=sk-xxx
BIOCORTEX_REASONING_MODEL=qwen3-max
BIOCORTEX_CODER_MODEL=qwen3-max
BIOCORTEX_FAST_MODEL=qwen3-max

3) Single-Cell & Spatial Visualization (Vitessce)

BioCortex includes an interactive single-cell and spatial transcriptomics viewer powered by Vitessce. It loads directly from CDN — no build step required.

What it does:

Opens .h5ad files in the Results Viewer with interactive dark-themed visualizations
Automatically detects embeddings (UMAP, t-SNE, PCA) and spatial coordinates
Displays: Scatterplots, Spatial distribution, Cell Sets, Cell Set Sizes
For small datasets (<30k cells): also shows Gene List, Heatmap, Expression by Cell Set
Includes CellChat 🧠 — an AI chat assistant that can answer questions about your data

Required Python packages (already in requirements.txt):

pip install anndata scanpy

Optional for UMAP computation (if your h5ad only has PCA):

pip install leidenalg igraph

Supported h5ad data:

Field	Description	Required
`obsm['X_umap']`	UMAP coordinates	At least one embedding
`obsm['X_tsne']`	t-SNE coordinates	or PCA (auto-computes UMAP)
`obsm['X_pca']`	PCA coordinates	Fallback
`obsm['spatial']`	Spatial coordinates	For spatial transcriptomics
`obs[categorical cols]`	Cell type annotations	For cell set coloring
`X` (expression matrix)	Gene expression	For Gene List / Heatmap

For integrating the h5ad viewer in a new project (dependencies, env vars, API routes, and frontend iframe), see biocortex/web/docs/VITESSCE_H5AD_SETUP.md.

4) Run Web UI

python biocortex_web_app.py --port 7860

Open:

http://localhost:7860

Python Usage

from biocortex.agent import BioCortexAgent

agent = BioCortexAgent()
report = agent.go("Analyze scRNA-seq data: QC, clustering, annotation, DEGs")

Force a strategy:

from biocortex.config import ReasoningStrategy
agent.go("Discover drug targets for ALS", strategy=ReasoningStrategy.MCTS)

3-Stage Hybrid Retrieval (Core Advantage)

BioCortex uses a 3-stage hybrid retrieval pipeline for tool selection — a key architectural advantage over Biomni:

Stage 1: Vector Semantic Search

Fast recall using sentence embeddings (all-MiniLM-L6-v2)
Retrieves top-50 candidates in milliseconds
Scales to 10,000+ tools via FAISS index

Stage 2: Knowledge Graph Expansion

Discovers implicit dependencies via 2-hop BFS traversal
Example: Query "scRNA-seq clustering" → expands to include "batch correction", "doublet removal"
Leverages tool co-occurrence graph + biological ontologies (GO, KEGG)

Stage 3: LLM Reranking

Precision selection using task context
LLM picks final top-15 tools from expanded candidates
Context-aware: considers task requirements, data types, dependencies

Performance vs. Biomni

Metric	Biomni (LLM-only)	BioCortex (3-stage)	Speedup
Tool retrieval time	3-5 seconds	0.5-1 second	3-5×
Scalability	Limited by LLM context	10,000+ tools	100×
Dependency discovery	Manual	Automatic (KG)	✅

Default: Vector search is enabled by default. To disable (e.g., for offline environments):

# In .env file:
BIOCORTEX_ENABLE_VECTOR_SEARCH=false

Biomni tools (200+ domain tools)

Biomni has its own GitHub repo. BioCortex can use it in two ways:

Embedded (default) — If vendor/Biomni/biomni exists (e.g. after adding the Biomni submodule and running git submodule update --init --recursive), no configuration is needed. The Biomni repo URL will be provided once this framework is on GitHub; see vendor/README.md for submodule setup.
Custom path — Override with your own Biomni clone:

python biocortex_web_app.py --biomni-path /path/to/Biomni/biomni

Or in Python:

agent = BioCortexAgent(biomni_path="/path/to/Biomni/biomni")  # optional; uses vendor/Biomni/biomni if empty and present

Multimodal Support

BioCortex can encode:

Sequences (protein, DNA, RNA)
Structures (PDB, chemistry)
Images (microscopy / pathology)

Optional dependencies (install only if needed):

pip install torch transformers torchvision fair-esm pillow

Knowledge Graph

BioCortex maintains a persistent knowledge graph and auto-learns new facts from analyses.

The graph supports:

Tool relations
GO / KEGG / UniProt integration
Entity search and context generation

Web UI Features

Chat-based analysis interface
Strategy selector (Auto / ReAct / DAG / MCTS)
DAG execution visualization (Mermaid)
Tool browser (200+ tools)
Knowledge graph explorer
Multimodal encoder panel
Memory viewer
E2E test panel
Results Viewer — multi-tab file browser for text, images, notebooks, CSV, and more
Vitessce Integration — interactive single-cell / spatial transcriptomics viewer (dark theme)
- Scatterplot (UMAP / t-SNE / PCA), Spatial distribution, Cell Sets, Cell Set Sizes
- Gene List, Heatmap, Expression by Cell Set (for datasets < 30k cells)
- Loads from CDN, no npm/build step required
CellChat 🧠 — AI data assistant embedded in the Vitessce viewer
- Ask questions about cell types, spatial patterns, gene expression
- Uses your configured LLM (reasoning model)
- Context-aware: knows dataset metadata, cell distributions, top genes

Multi-User Web UI (Roadmap)

BioCortex currently ships with a single-user Gradio UI. If you need multi-user isolation, authentication, and per-user files/history:

Recommended path: adapt the multi-user modules from Biomni (Biomni/web/*)
Alternative: deploy behind an auth gateway (Nginx basic auth / OAuth)
We plan to integrate native multi-user support in a future release

Configuration

Config is managed by biocortex/config.py and supports .env overrides:

# ── LLM Provider (at least one required) ──────────
# Anthropic
ANTHROPIC_API_KEY=sk-ant-xxx

# OpenAI
OPENAI_API_KEY=sk-xxx

# Custom endpoint (DashScope, vLLM, Ollama, etc.)
BIOCORTEX_CUSTOM_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
BIOCORTEX_CUSTOM_API_KEY=sk-xxx

# ── Model Names ───────────────────────────────────
BIOCORTEX_REASONING_MODEL=qwen3-max     # Planner + Synthesizer + CellChat
BIOCORTEX_CODER_MODEL=qwen3-max         # Code generation
BIOCORTEX_FAST_MODEL=qwen3-max          # Quick tasks, validation

# ── Strategy ──────────────────────────────────────
BIOCORTEX_STRATEGY=dag_parallel          # react / dag_parallel / mcts / hypothesis

# ── Execution ─────────────────────────────────────
BIOCORTEX_TIMEOUT=600
BIOCORTEX_USE_DOCKER=false

# ── Paths ─────────────────────────────────────────
BIOCORTEX_DATA_PATH=./data
BIOCORTEX_OUTPUT_PATH=./output

Per-user config: Each user can override settings in ./users/{username}/.env.

Citation

If you use BioCortex in research, please cite:

@software{biocortex2026,
  title  = {BioCortex: A Next-Generation AI Agent for All Biological Systems},
  author = {BioCortex Team},
  year   = {2026},
  url    = {REPLACE_WITH_PROJECT_URL}
}

Contributing

We welcome contributions from the community:

New tools (domain-specific analysis functions)
Datasets and knowledge sources
Retrievers / KG improvements
Multimodal encoders and model adapters
Docs & tutorials

How to contribute:

Fork the repo and create a feature branch
Add your changes with tests or examples
Open a pull request describing your changes

Security Note

BioCortex executes LLM-generated code. For production use, enable sandboxing and isolate the runtime environment. Do not run with sensitive credentials or files on untrusted tasks.

License

Apache 2.0

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BioCortex: A Next-Generation AI Agent for All Biological Systems

Project Website

Quick Start

1) Installation

Option A: One-Click Install (Recommended)

Option B: Manual Install (Conda)

Option C: From biocortex_env （recommend）

2) Configure API Keys

3) Single-Cell & Spatial Visualization (Vitessce)

4) Run Web UI

Python Usage

3-Stage Hybrid Retrieval (Core Advantage)

Stage 1: Vector Semantic Search

Stage 2: Knowledge Graph Expansion

Stage 3: LLM Reranking

Performance vs. Biomni

Biomni tools (200+ domain tools)

Multimodal Support

Knowledge Graph

Web UI Features

Multi-User Web UI (Roadmap)

Configuration

Citation

Contributing

Security Note

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

BioCortex: A Next-Generation AI Agent for All Biological Systems

Project Website

Quick Start

1) Installation

Option A: One-Click Install (Recommended)

Option B: Manual Install (Conda)

Option C: From biocortex_env （recommend）

2) Configure API Keys

3) Single-Cell & Spatial Visualization (Vitessce)

4) Run Web UI

Python Usage

3-Stage Hybrid Retrieval (Core Advantage)

Stage 1: Vector Semantic Search

Stage 2: Knowledge Graph Expansion

Stage 3: LLM Reranking

Performance vs. Biomni

Biomni tools (200+ domain tools)

Multimodal Support

Knowledge Graph

Web UI Features

Multi-User Web UI (Roadmap)

Configuration

Citation

Contributing

Security Note

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Packages