Just chat with OpenClaw: "Research X" β done.
π¨π³ δΈζ Β· π―π΅ ζ₯ζ¬θͺ Β· π°π· νκ΅μ΄ Β· π«π· FranΓ§ais Β· π©πͺ Deutsch Β· πͺπΈ EspaΓ±ol Β· π§π· PortuguΓͺs Β· π·πΊ Π ΡΡΡΠΊΠΈΠΉ Β· πΈπ¦ Ψ§ΩΨΉΨ±Ψ¨ΩΨ©
pip install -e . && researchclaw run --topic "Your research idea here" --auto-approveYou think it. AutoResearchClaw writes it.
Drop a research topic β get back a full academic paper with real literature from arXiv & Semantic Scholar, hardware-aware sandbox experiments (GPU/MPS/CPU auto-detected), statistical analysis, multi-agent peer review, and conference-ready LaTeX targeting NeurIPS/ICML/ICLR. No babysitting. No copy-pasting. No hallucinated references.
| π | paper_draft.md | Full academic paper (Introduction, Related Work, Method, Experiments, Results, Conclusion) |
| π | paper.tex | Conference-ready LaTeX (NeurIPS / ICLR / ICML templates) |
| π | references.bib | Real BibTeX references from Semantic Scholar and arXiv β auto-pruned to match inline citations |
| π | verification_report.json | 4-layer citation integrity + relevance verification (arXiv, CrossRef, DataCite, LLM) |
| π§ͺ | experiment runs/ | Generated code + sandbox results + structured JSON metrics |
| π | charts/ | Auto-generated condition comparison charts with error bars and confidence intervals |
| π | reviews.md | Multi-agent peer review with methodology-evidence consistency checks |
| 𧬠| evolution/ | Self-learning lessons extracted from each run |
| π¦ | deliverables/ | All final outputs in one folder β compile-ready for Overleaf |
The pipeline runs end-to-end without human intervention. When experiments fail, it self-heals. When hypotheses don't hold, it pivots. When citations are fake, it kills them.
# 1. Clone & install
git clone https://github.com/Jiaaqiliu/AutoResearchClaw.git
cd AutoResearchClaw
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
# 2. Configure
cp config.researchclaw.example.yaml config.arc.yaml
# Edit config.arc.yaml β set your LLM API endpoint and key
# 3. Run
export OPENAI_API_KEY="sk-..."
researchclaw run --config config.arc.yaml --topic "Your research idea" --auto-approveOutput β artifacts/rc-YYYYMMDD-HHMMSS-<hash>/deliverables/ β compile-ready LaTeX, BibTeX, experiment code, charts.
π Minimum required config
project:
name: "my-research"
research:
topic: "Your research topic here"
llm:
base_url: "https://api.openai.com/v1"
api_key_env: "OPENAI_API_KEY"
primary_model: "gpt-4o"
fallback_models: ["gpt-4o-mini"]
experiment:
mode: "sandbox"
sandbox:
python_path: ".venv/bin/python"| Capability | How It Works |
|---|---|
| π PIVOT / REFINE Loop | Stage 15 autonomously decides: PROCEED, REFINE (tweak params), or PIVOT (new direction). Artifacts auto-versioned. |
| π€ Multi-Agent Debate | Hypothesis generation, result analysis, and peer review each use structured multi-perspective debate. |
| 𧬠Self-Learning | Lessons extracted per run (decision rationale, runtime warnings, metric anomalies) with 30-day time-decay. Future runs learn from past mistakes. |
| π Knowledge Base | Every run builds structured KB across 6 categories (decisions, experiments, findings, literature, questions, reviews). |
| π‘οΈ Sentinel Watchdog | Background quality monitor: NaN/Inf detection, paper-evidence consistency, citation relevance scoring, anti-fabrication guard. |
AutoResearchClaw is an OpenClaw-compatible service. Install it in OpenClaw and launch autonomous research with a single message β or use it standalone via CLI, Claude Code, or any AI coding assistant.
If you already use OpenClaw as your AI assistant:
1οΈβ£ Share the GitHub repo URL with OpenClaw
2οΈβ£ OpenClaw auto-reads RESEARCHCLAW_AGENTS.md β understands the pipeline
3οΈβ£ Say: "Research [your topic]"
4οΈβ£ Done β OpenClaw clones, installs, configures, runs, and returns results
That's it. OpenClaw handles git clone, pip install, config setup, and pipeline execution automatically. You just chat.
π‘ What happens under the hood
- OpenClaw reads
RESEARCHCLAW_AGENTS.mdβ learns the research orchestrator role - OpenClaw reads
README.mdβ understands installation and pipeline structure - OpenClaw copies
config.researchclaw.example.yamlβconfig.yaml - Asks for your LLM API key (or uses your environment variable)
- Runs
pip install -e .+researchclaw run --topic "..." --auto-approve - Returns the paper, LaTeX, experiments, and citations
For deeper integration, AutoResearchClaw includes a bridge adapter system with 6 optional capabilities:
# config.arc.yaml
openclaw_bridge:
use_cron: true # β° Scheduled research runs
use_message: true # π¬ Progress notifications (Discord/Slack/Telegram)
use_memory: true # π§ Cross-session knowledge persistence
use_sessions_spawn: true # π Spawn parallel sub-sessions for concurrent stages
use_web_fetch: true # π Live web search during literature review
use_browser: false # π₯οΈ Browser-based paper collectionEach flag activates a typed adapter protocol. When OpenClaw provides these capabilities, the adapters consume them without code changes. See docs/integration-guide.md for full details.
| Method | How |
|---|---|
| Standalone CLI | researchclaw run --topic "..." --auto-approve |
| Python API | from researchclaw.pipeline import Runner; Runner(config).run() |
| Claude Code | Reads RESEARCHCLAW_CLAUDE.md β just say "Run research on [topic]" |
| OpenCode | Reads .claude/skills/ β same natural language interface |
| Any AI CLI | Provide RESEARCHCLAW_AGENTS.md as context β agent auto-bootstraps |
Phase A: Research Scoping Phase E: Experiment Execution
1. TOPIC_INIT 12. EXPERIMENT_RUN
2. PROBLEM_DECOMPOSE 13. ITERATIVE_REFINE β self-healing
Phase B: Literature Discovery Phase F: Analysis & Decision
3. SEARCH_STRATEGY 14. RESULT_ANALYSIS β multi-agent
4. LITERATURE_COLLECT β real API 15. RESEARCH_DECISION β PIVOT/REFINE
5. LITERATURE_SCREEN [gate]
6. KNOWLEDGE_EXTRACT Phase G: Paper Writing
16. PAPER_OUTLINE
Phase C: Knowledge Synthesis 17. PAPER_DRAFT
7. SYNTHESIS 18. PEER_REVIEW β evidence check
8. HYPOTHESIS_GEN β debate 19. PAPER_REVISION
Phase D: Experiment Design Phase H: Finalization
9. EXPERIMENT_DESIGN [gate] 20. QUALITY_GATE [gate]
10. CODE_GENERATION 21. KNOWLEDGE_ARCHIVE
11. RESOURCE_PLANNING 22. EXPORT_PUBLISH β LaTeX
23. CITATION_VERIFY β relevance check
Gate stages (5, 9, 20) pause for human approval or auto-approve with
--auto-approve. On rejection, the pipeline rolls back.
Decision loops: Stage 15 can trigger REFINE (β Stage 13) or PIVOT (β Stage 8), with automatic artifact versioning.
π What Each Phase Does
| Phase | What Happens |
|---|---|
| A: Scoping | LLM decomposes the topic into a structured problem tree with research questions |
| A+: Hardware | Auto-detects GPU (NVIDIA CUDA / Apple MPS / CPU-only), warns if local hardware is limited, adapts code generation accordingly |
| B: Literature | Multi-source search (arXiv-first, then Semantic Scholar) for real papers, screens by relevance, extracts knowledge cards |
| C: Synthesis | Clusters findings, identifies research gaps, generates testable hypotheses via multi-agent debate |
| D: Design | Designs experiment plan, generates hardware-aware runnable Python (GPU tier β package selection), estimates resource needs |
| E: Execution | Runs experiments in sandbox, detects NaN/Inf and runtime bugs, self-heals code via targeted LLM repair |
| F: Analysis | Multi-agent analysis of results; autonomous PROCEED / REFINE / PIVOT decision with rationale |
| G: Writing | Outlines β section-by-section drafting (5,000-6,500 words) β peer reviews (with methodology-evidence consistency) β revises with length guard |
| H: Finalization | Quality gate, knowledge archival, LaTeX export with conference template, citation integrity + relevance verification |
| Feature | Description |
|---|---|
| π Multi-Source Literature | Real papers from arXiv (primary) + Semantic Scholar β query expansion, deduplication, circuit breaker with graceful degradation |
| π 4-Layer Citation Verification | arXiv ID check β CrossRef/DataCite DOI β Semantic Scholar title match β LLM relevance scoring. Hallucinated refs auto-removed. |
| π₯οΈ Hardware-Aware Execution | Auto-detects GPU (NVIDIA CUDA / Apple MPS / CPU-only) and adapts code generation, imports, and experiment scale accordingly |
| π§ͺ Sandbox Experiments | AST-validated code, immutable harness, NaN/Inf fast-fail, self-healing repair, iterative refinement (up to 10 rounds), partial result capture |
| π Conference-Grade Writing | NeurIPS/ICML/ICLR templates, section-by-section drafting (5,000-6,500 words), anti-fabrication guard, revision length guard, anti-disclaimer enforcement |
| π Template Switching | neurips_2025, iclr_2026, icml_2026 β Markdown β LaTeX with math, tables, figures, cross-refs, \cite{} |
| π¦ Quality Gates | 3 human-in-the-loop gates (Stages 5, 9, 20) with rollback. Skip with --auto-approve. |
Click to expand full configuration reference
# === Project ===
project:
name: "my-research" # Project identifier
mode: "docs-first" # docs-first | semi-auto | full-auto
# === Research ===
research:
topic: "..." # Research topic (required)
domains: ["ml", "nlp"] # Research domains for literature search
daily_paper_count: 8 # Target papers per search query
quality_threshold: 4.0 # Minimum quality score for papers
# === Runtime ===
runtime:
timezone: "America/New_York" # For timestamps
max_parallel_tasks: 3 # Concurrent experiment limit
approval_timeout_hours: 12 # Gate stage timeout
retry_limit: 2 # Retry count on stage failure
# === LLM ===
llm:
provider: "openai-compatible" # Provider type
base_url: "https://..." # API endpoint (required)
api_key_env: "OPENAI_API_KEY" # Env var for API key (required)
api_key: "" # Or hardcode key here
primary_model: "gpt-4o" # Primary model
fallback_models: ["gpt-4o-mini"] # Fallback chain
s2_api_key: "" # Semantic Scholar API key (optional, higher rate limits)
# === Experiment ===
experiment:
mode: "sandbox" # simulated | sandbox | ssh_remote
time_budget_sec: 600 # Max execution time per run (default: 600s)
max_iterations: 10 # Max optimization iterations
metric_key: "val_loss" # Primary metric name
metric_direction: "minimize" # minimize | maximize
sandbox:
python_path: ".venv/bin/python"
gpu_required: false
allowed_imports: [math, random, json, csv, numpy, torch, sklearn]
max_memory_mb: 4096
ssh_remote:
host: "" # GPU server hostname
gpu_ids: [] # Available GPU IDs
remote_workdir: "/tmp/researchclaw_experiments"
# === Export ===
export:
target_conference: "neurips_2025" # neurips_2025 | iclr_2026 | icml_2026
authors: "Anonymous"
bib_file: "references"
# === Prompts ===
prompts:
custom_file: "" # Path to custom prompts YAML (empty = defaults)
# === Security ===
security:
hitl_required_stages: [5, 9, 20] # Stages requiring human approval
allow_publish_without_approval: false
redact_sensitive_logs: true
# === Knowledge Base ===
knowledge_base:
backend: "markdown" # markdown | obsidian
root: "docs/kb"
# === Notifications ===
notifications:
channel: "console" # console | discord | slack
target: ""
# === OpenClaw Bridge ===
openclaw_bridge:
use_cron: false # Scheduled research runs
use_message: false # Progress notifications
use_memory: false # Cross-session knowledge persistence
use_sessions_spawn: false # Spawn parallel sub-sessions
use_web_fetch: false # Live web search
use_browser: false # Browser-based paper collectionInspired by:
- π¬ AI Scientist (Sakana AI) β Automated research pioneer
- π§ AutoResearch (Andrej Karpathy) β End-to-end research automation
- π FARS (Analemma) β Fully Automated Research System
MIT β see LICENSE for details.
If you find AutoResearchClaw useful, please cite:
@misc{liu2026autoresearchclaw,
author = {Liu, Jiaqi and Xia, Peng and Han, Siwei and Qiu, Shi and Zhang, Letian and Chen, Guiming and Tu, Haoqin and Yang, Xinyu and and Zhou, Jiawei and Zhu, Hongtu and Li, Yun and Zheng, Zeyu and Xie, Cihang and Ding, Mingyu and Yao, Huaxiu},
title = {AutoResearchClaw: Fully Autonomous Research from Idea to Paper},
year = {2026},
organization = {GitHub},
url = {https://github.com/aiming-lab/AutoResearchClaw},
}Built with π¦ by the AutoResearchClaw team

