PaperPilot is a CLI research agent for scholarly literature review across AI, biomedicine, and AI for Science.
It turns one user request into a traceable, evidence-based research workflow and generates bilingual reports (zh/en) in Markdown, HTML, and PDF.
PaperPilot is not a chatbot. It is an interactive scientific workflow:
- Parse natural-language research requests
- Build an explicit search protocol with inclusion/exclusion rules
- Query multi-source literature APIs
- Normalize, deduplicate, and screen papers
- Verify URLs/PDF/code availability
- Synthesize evidence and generate review reports
- Output structured artifacts for reproducibility
Each run creates a dedicated folder under runs/ with full state, logs, and intermediate files.
- Natural-language intake with LLM-assisted interpretation
- Interactive shell with:
/modelto manage LLM profiles/sourcesto inspect search source/API status/doctorfor quick self-checks
- Multi-source retrieval with source registry and diagnostics
- Resume/inspect modes for reproducible research sessions
- Protocol-aware search using plan + diversified keywords
- Canonicalized
Paperschema and robust deduplication - Core/adjacent/excluded paper classification
- PDF + code-link verification (no paywall bypass)
- Optional full-text extraction from downloadable PDFs
- Canonical bilingual report model
- Consistent
[1][2][3]citation mapping - Method taxonomy and evidence matrix
- Markdown + HTML + PDF outputs with aligned content
- Final report view keeps up to 100 papers by default, without a hard minimum
- Obsidian Wiki export with paper, method, topic, and claim notes
- Quality gates and reflection workflow
- Evidence ledger linking claims to corpus evidence
- Review checks for citation compliance and source reliability
- Event stream logs for auditability
Default free sources:
- arXiv
- Semantic Scholar
- OpenAlex
- Crossref
- OpenReview
- PubMed / NCBI E-utilities
- Europe PMC
- bioRxiv / medRxiv
- DBLP
- ACL Anthology
- Papers.cool
Optional API-key sources:
- DeepXiv / Agentic Data
- CORE
- Lens.org Scholarly API
- IEEE Xplore
- Springer Nature
- Elsevier / Scopus
- Dimensions
python -m pip install paperpilot -i https://pypi.org/simpleLocal development:
git clone https://github.com/CHB-learner/PaperPilot.git
cd PaperPilot
python -m pip install -e .PaperPilot requires OpenAI-compatible LLM settings for query understanding, planning, synthesis, and report generation.
On first run, it creates an editable configuration template at:
~/.paperpilot/config.json
Minimal default template:
{
"active": "default",
"profiles": {
"default": {
"api_key": "",
"base_url": "",
"model": "gpt-5.2"
}
},
"sources": {
"core": {"enabled": null, "api_key": "", "base_url": ""},
"lens": {"enabled": null, "api_key": "", "base_url": ""},
"ieee": {"enabled": null, "api_key": "", "base_url": ""},
"springer": {"enabled": null, "api_key": "", "base_url": ""},
"elsevier": {"enabled": null, "api_key": "", "base_url": ""},
"dimensions": {"enabled": null, "api_key": "", "base_url": ""},
"deepxiv": {"enabled": null, "api_key": "", "base_url": ""}
}
}Notes:
- Leave optional source API keys empty if unavailable.
enabled: nullmeans auto-enable once a valid key is provided.~/.paperpilot/config.jsonis not committed; edit it directly or use CLI commands.
PaperPilot config set --base-url https://api.deepseek.com --model deepseek-chat
PaperPilot config import ./api.json
PaperPilot config list
PaperPilot config use deepseek
PaperPilot config show
PaperPilot --doctorPaperPilot sources list
PaperPilot sources config core
PaperPilot sources config deepxiv
PaperPilot sources enable core
PaperPilot sources test coreInside interactive mode, use /sources and /doctor.
| Source | Access page |
|---|---|
| CORE | https://core.ac.uk/services/api |
| Lens.org | https://docs.api.lens.org/ |
| IEEE Xplore | https://developer.ieee.org/getting_started |
| Springer Nature | https://dev.springernature.com/ |
| Elsevier / Scopus | https://dev.elsevier.com/ |
| Dimensions | https://docs.dimensions.ai/dsl/api.html |
| DeepXiv / Agentic Data | https://data.rag.ac.cn/api/docs |
| Papers.cool | https://papers.cool |
Interactive usage:
PaperPilotCommand mode example:
PaperPilot "RNA inverse folding sequence design" \
--auto-confirm \
--max-papers 50 \
--since-year 2021 \
--github-filter required \
--sources auto \
--mode apa \
--quality balancedImport local corpus and skip download:
PaperPilot "RNA inverse folding sequence design" \
--auto-confirm \
--user-corpus ./papers \
--user-corpus references.bib \
--no-downloadInspect/resume workflow:
PaperPilot inspect runs/<task-id>
PaperPilot resume runs/<task-id>PaperPilot follows this state-machine pipeline:
Intake -> Protocol -> Search -> Corpus -> Screening -> Verification -> Synthesis -> Review -> Report
flowchart LR
U[User request] --> C[Run context]
C --> QA[Query understanding]
QA --> PL[Planning + Protocol]
PL --> ST[Source Registry search]
ST --> NB[Corpus normalization]
NB --> SC[Core/adjacent screening]
SC --> VF[Verification + PDF + code checks]
VF --> SY[Literature matrix]
SY --> QG[Quality gate + reflection]
QG --> EL[Evidence ledger]
EL --> RP[Report render (ZH/EN)]
runs/<task-id>/ will contain:
task.json/state.json/events.jsonl/manifest.jsonquery_understanding.md/plan.json/protocol.jsonmetadata.json/corpus.json/core_papers.jsonadjacent_papers.json/excluded_papers.json/ranked_papers.jsonverification.json/download_log.json/fulltext//paper_notes.jsonliterature_matrix.json/synthesis.json/quality_gate.jsonevidence_ledger.json/review_agent_findings.jsonreport.canonical.json/report.zh.md/report.en.mdreport.zh.html/report.en.html/report.zh.pdf/report.en.pdfreport_selection.json/shortfall.jsonwhen no reportable papers are availableobsidian_wiki/withindex.md, paper notes, method notes, topic notes, claim notes, and wiki lint metadatapdfs//source_diagnostics.json/registries.json/prompt_manifest.json
Each successful run generates runs/<task-id>/obsidian_wiki/ by default. Open that folder as an Obsidian vault to browse:
index.md: research entry point and reported-paper overviewpapers/: one note per reported paper with citation label, PDF/code links, method family, and evidence basismethods/: method-family notes linked to representative paperstopics/: query/subtopic notesclaims/: evidence-map claim notes_meta/manifest.jsonand_meta/wiki_lint.json: provenance, hashes, broken-link checks
Use --no-obsidian-wiki to skip Wiki generation.
any: keep all papers and annotate code availabilityrequired: keep only papers with detected code repositories in final viewnone: keep only papers without detected public code links
--max-papers INT maximum papers in final report view; default: 100
--min-report-papers INT optional minimum report size; default: 0
--since-year INT preferred lower year bound
--github-filter any|required|none
--github-search-limit INT
--no-download skip PDF downloads
--pdf-limit INT maximum PDFs to download
--user-corpus PATH repeatable local corpus path
--mode quick|apa|systematic
--interaction auto|gated
--quality fast|balanced|strict
--include-adjacent include adjacent papers in appendices
--sources auto|all|core|biomed|cs|configured
--enable-source SOURCE enable one source (repeatable)
--disable-source SOURCE disable one source (repeatable)
--no-obsidian-wiki skip Obsidian Wiki export
See paperpilot --help for full options and Chinese/English output.
- Keep run outputs and generated artifacts out of source control.
- Keep API keys out of git history.
- Prefer
.gitignoreover manual cleanup. - Use semantic tags for releases and keep
README+ docs aligned. - Keep
.github/workflows/*,RELEASING.md,CHANGELOG.mdin sync when publishing.
- Ensure
~/.paperpilot/config.json,api.json, and.envwith credentials are never committed. - Add/keep
LICENSEand.gitignore. - Add source code and tags before publishing release assets.
- Publish GitHub Pages from
docs/. - Keep versions in
pyproject.toml,literature_agent/__init__.py, and generated manifests aligned.
# dry-run checks only
./scripts/release_everywhere.sh --dry-run
# normal release (pushed commit + tag + GH release + PyPI)
export PYPI_TOKEN='pypi-...'
./scripts/release_everywhere.sh
# release without publishing to PyPI
./scripts/release_everywhere.sh --no-pypiSuggested publish flow (full):
python -m unittest discover -s tests
python -m compileall literature_agent
./publish_pypi.sh --dry-run --version <VERSION>
git add -A
git commit -m "chore: release v<VERSION>"
git tag -a v<VERSION> -m "v<VERSION>"
git push origin main --tags
./publish_pypi.sh --version <VERSION>For GitHub Pages: enable Pages to deploy from main + /docs, or rely on .github/workflows/gh-pages.yml.
If you use PaperPilot in your work, include the repository URL and version used so results are reproducible.