DeepCritical isn't just another research assistant—it's a framework for building entire research ecosystems. While a typical user asks one question, DeepCritical generates datasets of hypotheses, tests them systematically, runs simulations, and produces comprehensive reports—all through configurable Hydra-based workflows.
# Hydra makes this possible - single config generates entire research workflows
flows:
hypothesis_generation: {enabled: true, batch_size: 100}
hypothesis_testing: {enabled: true, validation_environments: ["simulation", "real_world"]}
validation: {enabled: true, methods: ["statistical", "experimental"]}
simulation: {enabled: true, frameworks: ["python", "docker"]}
reporting: {enabled: true, formats: ["academic_paper", "dpo_dataset"]}
- Hydra Configuration:
configs/
directory with flow-based composition - Pydantic Graph: Stateful workflow execution with
ResearchState
- Pydantic AI Agents: Multi-agent orchestration with
@defer
tools - Flow Routing: Dynamic composition based on
flows.*.enabled
flags
The project already has the foundation for your vision:
# Current flow configurations (configs/statemachines/flows/)
- hypothesis_generation.yaml # Generate hypothesis datasets
- hypothesis_testing.yaml # Test hypothesis environments
- execution.yaml # Run experiments/simulations
- reporting.yaml # Generate research outputs
- bioinformatics.yaml # Multi-source data fusion
- rag.yaml # Retrieval-augmented workflows
- deepsearch.yaml # Web research automation
@dataclass
class AgentOrchestrator:
"""Spawns nested REACT loops, manages subgraphs, coordinates multi-agent workflows"""
config: AgentOrchestratorConfig
nested_loops: Dict[str, NestedReactConfig]
subgraphs: Dict[str, SubgraphConfig]
break_conditions: List[BreakCondition] # Loss functions for smart termination
class HypothesisDataset(BaseModel):
dataset_id: str
hypotheses: List[Dict[str, Any]] # Generated hypothesis batches
source_workflows: List[str]
metadata: Dict[str, Any]
class HypothesisTestingEnvironment(BaseModel):
environment_id: str
hypothesis: Dict[str, Any]
test_configuration: Dict[str, Any]
expected_outcomes: List[str]
success_criteria: Dict[str, Any]
- Primary REACT: Main orchestration workflow
- Sub-workflows: Specialized execution paths (RAG, bioinformatics, search)
- Nested Loops: Multi-level reasoning with configurable break conditions
- Subgraphs: Modular workflow components
- Bioinformatics: Neo4j RAG, GO annotations, PubMed integration
- Search: Web search, deep search, integrated retrieval
- Code Execution: Docker sandbox, Python execution environments
- RAG: Vector stores, document processing, embeddings
- Analytics: Quality assessment, loss function evaluation
# New flow configuration needed
flows:
coding_agent:
enabled: true
languages: ["python", "r", "julia"]
frameworks: ["pytorch", "tensorflow", "scikit-learn"]
execution_environments: ["docker", "local", "cloud"]
# Extend reporting.yaml
reporting:
formats: ["academic_paper", "blog_post", "technical_report", "dpo_dataset"]
agents:
- role: "structure_organizer"
- role: "content_writer"
- role: "editor_reviewer"
- role: "formatter_publisher"
- Persistent State: Non-agentics datasets for workflow state
- Trace Logging: Execution traces → formatted datasets
- Ana's Neo4j RAG: Agent-based knowledge base management
class MetaAgent(BaseModel):
"""Agent that uses DeepCritical to build and answer with custom agents"""
def create_custom_agent(self, specification: AgentSpecification) -> Agent:
# Generate agent configuration
# Build agent with tools, prompts, capabilities
# Deploy and execute
pass
The beauty of Hydra integration means we can build this incrementally:
# Start with hypothesis generation
deepresearch flows.hypothesis_generation.enabled=true question="machine learning"
# Add hypothesis testing
deepresearch flows.hypothesis_testing.enabled=true question="test ML hypothesis"
# Enable full research pipeline
deepresearch flows="{hypothesis_generation,testing,validation,simulation,reporting}"
# configs/config.yaml - Main composition point
defaults:
- hypothesis_generation: default
- hypothesis_testing: default
- execution: default
- reporting: default
flows:
hypothesis_generation: {enabled: true, batch_size: 50}
hypothesis_testing: {enabled: true, validation_frameworks: ["simulation"]}
execution: {enabled: true, compute_backends: ["docker", "local"]}
reporting: {enabled: true, output_formats: ["markdown", "json"]}
@dataclass
class ResearchPipeline(BaseNode[ResearchState]):
async def run(self, ctx: GraphRunContext[ResearchState]) -> NextNode:
# Check enabled flows and compose dynamically
if ctx.state.config.flows.hypothesis_generation.enabled:
return HypothesisGenerationNode()
elif ctx.state.config.flows.hypothesis_testing.enabled:
return HypothesisTestingNode()
# ... etc
@defer
def generate_hypothesis_dataset(
ctx: RunContext[AgentDependencies],
research_question: str,
batch_size: int
) -> HypothesisDataset:
"""Generate a dataset of testable hypotheses"""
# Implementation using existing tools and agents
return dataset
deepresearch question="CRISPR applications in cancer therapy" \
flows.hypothesis_generation.enabled=true \
flows.reporting.format="literature_review"
deepresearch question="protein folding prediction improvements" \
flows.hypothesis_generation.enabled=true \
flows.hypothesis_testing.enabled=true \
flows.simulation.enabled=true
# Generate entire research program from minimal input
deepresearch question="novel therapeutic approaches for Alzheimer's" \
flows="{hypothesis_generation,testing,validation,reporting}" \
outputs.enable_dpo_datasets=true
This project provides a foundation for:
- Domain-Specific Research Agents: Biology, chemistry, physics, social sciences
- Publication Pipeline Automation: From hypothesis → experiment → paper
- Collaborative Research Platforms: Multi-researcher workflow coordination
- AI Research on AI: Using the system to improve itself
The framework is ready for extension:
# Current capabilities
uv run deepresearch --help
# Enable specific flows
uv run deepresearch question="your question" flows.hypothesis_generation.enabled=true
# Configure for batch processing
uv run deepresearch --config-name=config_with_modes \
question="batch research questions" \
app_mode=multi_level_react
- How should we structure the "final" meta-agent system? (Self-improving, agent factories, etc.)
- What database backends for persistent state? (SQLite, PostgreSQL, vector stores?)
- How to handle multi-researcher collaboration? (Access control, workflow sharing, etc.)
- What loss functions and judges for research quality? (Novelty, rigor, impact, etc.)
This is a sketchpad for building the future of autonomous research—let's collaborate on making it a reality! 🔬✨
A comprehensive research automation platform architecture for autonomous scientific discovery workflows.
# Install uv and dependencies
uv sync
# Single REACT mode
uv run deepresearch question="What is machine learning?" app_mode=single_react
# Multi-level REACT with nested loops
uv run deepresearch question="Analyze machine learning in drug discovery" app_mode=multi_level_react
# Complex nested orchestration
uv run deepresearch question="Design a comprehensive research framework" app_mode=nested_orchestration
# Loss-driven execution
uv run deepresearch question="Optimize research quality" app_mode=loss_driven
# Using configuration files
uv run deepresearch --config-name=config_with_modes question="Your question" app_mode=multi_level_react
# Single REACT mode
deepresearch question="What is machine learning?" app_mode=single_react
# Multi-level REACT with nested loops
deepresearch question="Analyze machine learning in drug discovery" app_mode=multi_level_react
# Complex nested orchestration
deepresearch question="Design a comprehensive research framework" app_mode=nested_orchestration
# Loss-driven execution
deepresearch question="Optimize research quality" app_mode=loss_driven
# Using configuration files
deepresearch --config-name=config_with_modes question="Your question" app_mode=multi_level_react
# Install uv if not already installed
# Windows:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
# macOS/Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
# Install dependencies and create virtual environment
uv sync
# Run the application
uv run deepresearch --help
# Create virtual environment
python -m venv .venv && .venv\Scripts\activate
# Install package
pip install -e .
# Run default workflow
uv run deepresearch
# Run with custom question
uv run deepresearch question="What are PRIME's core contributions?"
# Run with specific configuration
uv run deepresearch --config-name=config_with_modes question="Your question" app_mode=multi_level_react
# Run default workflow
python -m deepresearch.app
# Run with custom question
python -m deepresearch.app question="What are PRIME's core contributions?"
# Design therapeutic antibody
uv run deepresearch flows.prime.enabled=true question="Design a therapeutic antibody for SARS-CoV-2"
# Analyze protein sequence
uv run deepresearch flows.prime.enabled=true question="Analyze protein sequence MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"
# Predict protein structure
uv run deepresearch flows.prime.enabled=true question="Predict 3D structure of protein P12345"
# Design therapeutic antibody
python -m deepresearch.app flows.prime.enabled=true question="Design a therapeutic antibody for SARS-CoV-2"
# Analyze protein sequence
python -m deepresearch.app flows.prime.enabled=true question="Analyze protein sequence MKTVRQERLKSIVRILERSKEPVSGAQLAEELSVSRQVIVQDIAYLRSLGYNIVATPRGYVLAGG"
# Predict protein structure
python -m deepresearch.app flows.prime.enabled=true question="Predict 3D structure of protein P12345"
# GO + PubMed reasoning for gene function
python -m deepresearch.app flows.bioinformatics.enabled=true question="What is the function of TP53 gene based on GO annotations and recent literature?"
# Multi-source drug-target analysis
python -m deepresearch.app flows.bioinformatics.enabled=true question="Analyze the relationship between drug X and protein Y using expression profiles and interactions"
# Protein structure-function analysis
python -m deepresearch.app flows.bioinformatics.enabled=true question="What is the likely function of protein P12345 based on its structure and GO annotations?"
# PRIME flow (protein engineering)
python -m deepresearch.app flows.prime.enabled=true
# Bioinformatics flow (data fusion & reasoning)
python -m deepresearch.app flows.bioinformatics.enabled=true
# DeepSearch flow (web research)
python -m deepresearch.app flows.deepsearch.enabled=true
# Challenge flow (experimental)
python -m deepresearch.app challenge.enabled=true
# Custom plan steps
python -m deepresearch.app plan='["clarify scope","collect sources","synthesize"]'
# Manual confirmation mode
python -m deepresearch.app flows.prime.params.manual_confirmation=true
# Disable adaptive re-planning
python -m deepresearch.app flows.prime.params.adaptive_replanning=false
- Hydra Configuration: Uses Hydra composition for configuration (
configs/
) per Hydra docs - Pydantic Graph: Stateful workflow execution (
deepresearch/app.py
) per Pydantic Graph docs - PRIME Integration: Replicates the PRIME paper's three-stage architecture
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Parse │───▶│ Plan │───▶│ Execute │
│ │ │ │ │ │
│ Query │ │ DAG │ │ Tool │
│ Parser │ │ Gen. │ │ Exec. │
└─────────┘ └─────────┘ └─────────┘
- Parse →
QueryParser
- Semantic/syntactic analysis of research queries - Plan →
PlanGenerator
- DAG workflow construction with 65+ tools - Execute →
ToolExecutor
- Adaptive re-planning with strategic/tactical recovery
- 65+ Tools across 6 categories: Knowledge Query, Sequence Analysis, Structure Prediction, Molecular Docking, De Novo Design, Function Prediction
- Scientific Intent Detection: Automatically categorizes queries (protein_design, binding_analysis, structure_prediction, etc.)
- Domain-Specific Heuristics: Immunology, enzymology, cell biology, general protein domains
- Strategic Re-planning: Tool substitution (BLAST → ProTrek, AlphaFold2 → ESMFold)
- Tactical Re-planning: Parameter adjustment (E-value relaxation, exhaustiveness tuning)
- Execution History: Comprehensive tracking with failure pattern analysis
- Success Criteria Validation: Quantitative metrics (pLDDT, E-values) and binary outcomes
- Verifiable Results: All conclusions come from validated tools, never from LLM generation
- Tool Validation: Strict input/output schema compliance and type checking
- Mock Implementations: Complete development environment with realistic outputs
- Error Recovery: Graceful handling with actionable recommendations
- GO + PubMed: Gene Ontology annotations with paper context for reasoning tasks
- GEO + CMAP: Gene expression data with perturbation profiles
- DrugBank + TTD + CMAP: Drug-target-perturbation relationship graphs
- PDB + IntAct: Protein structure-interaction datasets
- Specialized Agents: DataFusionAgent, GOAnnotationAgent, ReasoningAgent, DataQualityAgent
- Pydantic AI Integration: Multi-model reasoning with evidence integration
- Deferred Tools: Efficient data processing with registry integration
- Quality Assessment: Cross-database consistency and evidence validation
- Non-Reductionist Approach: Multi-source evidence integration beyond structural similarity
- Evidence Code Prioritization: IDA (gold standard) > EXP > computational predictions
- 18 Vendored Bioinformatics Tools: FastQC, Samtools, Bowtie2, MACS3, HOMER, HISAT2, BEDTools, STAR, BWA, MultiQC, Salmon, StringTie, FeatureCounts, TrimGalore, Kallisto, HTSeq, TopHat, Picard
- Pydantic AI Integration: Strongly-typed tool decorators with automatic agent registration
- Testcontainers Deployment: Isolated execution environments for reproducible research
- Bioinformatics Pipeline Support: Complete RNA-seq, ChIP-seq, and genomics analysis workflows
- Cross-Database Validation: Consistency checks and temporal relevance
- Human Curation Integration: Leverages existing curation expertise
q### Example Data Fusion
{
"pmid": "12345678",
"title": "p53 mediates the DNA damage response in mammalian cells",
"abstract": "DNA damage induces p53 stabilization, leading to cell cycle arrest and apoptosis.",
"gene_id": "P04637",
"gene_symbol": "TP53",
"go_term_id": "GO:0006977",
"go_term_name": "DNA damage response",
"evidence_code": "IDA",
"annotation_note": "Curated based on experimental results in Figure 3."
}
- PRIME Flow: Protein engineering with 65+ specialized tools
- Bioinformatics Flow: Multi-source data fusion and integrative reasoning
- DeepSearch Flow: Web research and information gathering
- Challenge Flow: Experimental workflows for research challenges
- Default Flow: General-purpose research automation
Plan → Route to Flow → Execute Subflow → Synthesize Results
│
├─ PRIME: Parse → Plan → Execute → Evaluate
├─ Bioinformatics: Parse → Fuse → Assess → Reason → Synthesize
├─ DeepSearch: DSPlan → DSExecute → DSAnalyze → DSSynthesize
└─ Challenge: PrepareChallenge → RunChallenge → EvaluateChallenge
Key parameters in configs/config.yaml
:
# Research parameters
question: "Your research question here"
plan: ["step1", "step2", "step3"]
retries: 3
manual_confirm: false
# Flow control
flows:
prime:
enabled: true
params:
adaptive_replanning: true
manual_confirmation: false
tool_validation: true
bioinformatics:
enabled: true
data_sources:
go:
enabled: true
evidence_codes: ["IDA", "EXP"]
year_min: 2022
quality_threshold: 0.9
pubmed:
enabled: true
max_results: 50
include_full_text: true
fusion:
quality_threshold: 0.85
max_entities: 500
cross_reference_enabled: true
reasoning:
model: "anthropic:claude-sonnet-4-0"
confidence_threshold: 0.8
integrative_approach: true
# Output management
hydra:
run:
dir: outputs/${now:%Y-%m-%d}/${now:%H-%M-%S}
sweep:
dir: multirun/${now:%Y-%m-%d}/${now:%H-%M-%S}
Each flow has its own configuration file:
configs/statemachines/flows/prime.yaml
- PRIME flow parametersconfigs/statemachines/flows/bioinformatics.yaml
- Bioinformatics flow parametersconfigs/statemachines/flows/deepsearch.yaml
- DeepSearch parametersconfigs/statemachines/flows/hypothesis_generation.yaml
- Hypothesis flowconfigs/statemachines/flows/execution.yaml
- Execution flowconfigs/statemachines/flows/reporting.yaml
- Reporting flow
DeepCritical supports multiple LLM providers through OpenAI-compatible APIs:
# configs/llm/vllm_pydantic.yaml
provider: "vllm"
model_name: "meta-llama/Llama-3-8B"
base_url: "http://localhost:8000/v1"
api_key: null
generation:
temperature: 0.7
max_tokens: 512
top_p: 0.9
Supported providers:
- vLLM: High-performance local inference
- llama.cpp: Efficient GGUF model serving
- TGI: Hugging Face Text Generation Inference
- Custom: Any OpenAI-compatible server
See LLM Models Documentation for detailed configuration and usage examples.
Prompt templates in configs/prompts/
:
configs/prompts/prime_parser.yaml
- Query parsing promptsconfigs/prompts/prime_planner.yaml
- Workflow planning promptsconfigs/prompts/prime_executor.yaml
- Tool execution promptsconfigs/prompts/prime_evaluator.yaml
- Result evaluation prompts
To enable coverage reporting with Codecov:
-
Set up the repository in Codecov:
- Visit https://app.codecov.io/gh/DeepCritical/DeepCritical
- Click "Add new repository" or "Setup repo" if prompted
- Follow the setup wizard to connect your GitHub repository
-
Generate a Codecov token:
- In Codecov, go to your repository settings
- Navigate to "Repository Settings" > "Tokens"
- Generate a new token with "upload" permissions
-
Add the token as a GitHub secret:
- In your GitHub repository, go to Settings > Secrets and variables > Actions
- Click "New repository secret"
- Name:
CODECOV_TOKEN
- Value: Your Codecov token from step 2
-
Verify setup:
- Push a commit to trigger the CI pipeline
- Check that coverage reports appear in Codecov
The CI workflow will automatically upload coverage reports once the repository is configured in Codecov and the token is added as a secret.
# Install development dependencies
uv sync --dev
# Run tests
uv run pytest
# Run linting
uv run ruff check .
# Add new dependencies
uv add package_name
# Add development dependencies
uv add --dev package_name
# Update dependencies
uv lock --upgrade
# Run scripts in the project environment
uv run python script.py
DeepCritical/
├── deepresearch/ # Main package
│ ├── app.py # Pydantic Graph workflow
│ ├── src/ # PRIME implementation
│ │ ├── agents/ # PRIME agents (Parser, Planner, Executor)
│ │ ├── datatypes/ # Bioinformatics data types
│ │ ├── statemachines/ # Bioinformatics workflows
│ │ └── utils/ # Utilities (Tool Registry, Execution History)
│ └── tools/ # Tool implementations
├── configs/ # Hydra configuration
│ ├── config.yaml # Main configuration
│ ├── prompts/ # Prompt templates
│ └── statemachines/ # Flow configurations
├── docs/ # Documentation
│ └── bioinformatics_integration.md
└── .cursor/rules/ # Cursor rules for development
-
Create Flow Configuration:
# configs/statemachines/flows/my_flow.yaml enabled: true params: custom_param: "value"
-
Implement Nodes:
@dataclass class MyFlowNode(BaseNode[ResearchState]): async def run(self, ctx: GraphRunContext[ResearchState]) -> NextNode: # Implementation return NextNode()
-
Register in Graph:
# In run_graph function nodes = (..., MyFlowNode())
-
Add Flow Routing:
# In Plan node if getattr(cfg.flows, "my_flow", {}).get("enabled"): return MyFlowNode()
-
Define Tool Specification:
tool_spec = ToolSpec( name="my_tool", category=ToolCategory.SEQUENCE_ANALYSIS, input_schema={"sequence": "string"}, output_schema={"result": "dict"}, success_criteria={"min_confidence": 0.8} )
-
Implement Tool Runner:
class MyToolRunner(ToolRunner): def run(self, parameters: Dict[str, Any]) -> ExecutionResult: # Tool implementation return ExecutionResult(success=True, data=result)
-
Register Tool:
registry.register_tool(tool_spec, MyToolRunner)
-
Create Data Types:
from pydantic import BaseModel, Field class GOAnnotation(BaseModel): pmid: str = Field(..., description="PubMed ID") gene_id: str = Field(..., description="Gene identifier") go_term: GOTerm = Field(..., description="GO term") evidence_code: EvidenceCode = Field(..., description="Evidence code")
-
Implement Agents:
from pydantic_ai import Agent class DataFusionAgent: def __init__(self, model_name: str): self.agent = Agent( model=AnthropicModel(model_name), deps_type=BioinformaticsAgentDeps, result_type=DataFusionResult )
-
Create Workflow Nodes:
@dataclass class FuseDataSources(BaseNode[BioinformaticsState]): async def run(self, ctx: GraphRunContext[BioinformaticsState]) -> NextNode: # Data fusion logic return AssessDataQuality()
# Run multiple experiments
python -m deepresearch.app --multirun \
question="Design antibody for SARS-CoV-2",question="Analyze protein P12345" \
flows.prime.enabled=true
from deepresearch.src.utils.tool_registry import ToolRegistry, ToolSpec, ToolCategory
# Create custom tool
registry = ToolRegistry()
tool_spec = ToolSpec(
name="custom_analyzer",
category=ToolCategory.SEQUENCE_ANALYSIS,
input_schema={"sequence": "string"},
output_schema={"analysis": "dict"}
)
registry.register_tool(tool_spec)
from deepresearch.src.utils.execution_history import ExecutionHistory
# Load execution history
history = ExecutionHistory.load_from_file("outputs/2024-01-01/12-00-00/execution_history.json")
# Analyze performance
summary = history.get_execution_summary()
print(f"Success rate: {summary['success_rate']:.2%}")
print(f"Tools used: {summary['tools_used']}")
DeepCritical is built by an amazing community of contributors! We automatically recognize and celebrate everyone who helps make this project better.
Our Contributors page showcases all the wonderful people who contribute to DeepCritical.
- Contributors: See our contributor graph for commit activity
- Recent Activity: Check the pulse for recent contributions
- Community: Join our discussions to get involved
We welcome contributions of all kinds! Here's how to get started:
- 🐛 Found a Bug? Report it
- 💡 Have an Idea? Suggest a feature
- 📝 Want to Improve Docs? Help with documentation
- 💻 Ready to Code? Check our Contributing Guide
Every contribution matters, no matter how small! 🚀
- Hydra Documentation - Configuration management
- Pydantic Graph - Stateful workflow execution
- Pydantic AI - Agent-to-agent communication
- PRIME Paper - Original research paper
- Bioinformatics Integration - Multi-source data fusion guide
- Protein Engineering Tools - Tool ecosystem reference
DeepCritical uses dual licensing to maximize open, non-commercial use while reserving rights for commercial applications:
-
Source Code: Licensed under GNU General Public License v3 (GPLv3), allowing broad non-commercial use including copying, modification, distribution, and collaboration for personal, educational, research, or non-profit purposes.
-
AI Models and Application: Licensed under DeepCritical RAIL-AMS License, permitting non-commercial use subject to use restrictions (no discrimination, military applications, disinformation, or privacy violations), but prohibiting distribution and derivative creation for sharing.
For commercial use or permissions beyond these licenses, contact us to discuss alternative commercial licensing options.