DeltaScience

A verification layer for scientific work. Paste any scientific text — or a whole PDF — and every PMID, DOI, arXiv ID and GitHub repo is checked against the real record: does it exist, does its metadata match, does the cited paper support the claim? Fabricated and mis-cited references are flagged. Deterministic, no API key.

▶ Live demo & interactive verifier — paste a paragraph, watch it catch a fabricated or mis-cited reference.

What is DeltaScience?

DeltaScience checks the citations in scientific writing against the real record — whether you drafted the text yourself or with an AI. Its core is a citation verifier: paste a paragraph, a hypothesis, or a whole paper, and every identifier is checked against PubMed / Crossref / OpenAlex / arXiv / DataCite / GitHub — does it exist, does its metadata match, and does the cited paper actually support the claim? It is deterministic (real lookups + string matching, no LLM in the trust path), so it runs with no API key.

Around that trust path it adds the discovery layer: scan (retrieve the closest real prior work) and gap (a coverage-honest read of how crowded an area is) — plus a workflow layer that runs the right ones for your goal (grant, paper, review, ideate). The principle is no LLM in the trust path, not "no LLM anywhere": verify must be deterministic, but discovery is better with intelligence. So scan/gap run keyless out of the box (deterministic term-overlap retrieval), and get sharper when an agent drives them — pass your own queries with scan --query "…", or let the deltasci-ground skill write the queries and reason over the results. A weak discovery pass can only make you miss prior art; it can never corrupt a verdict.

It also includes the two-perspective co-reasoning mode it grew out of — a domain scientist and an ML engineer talk an idea through in structured rounds (deltasci run, needs an LLM), documented below. The verifier is the part that stands on its own.

It ships in two form factors:

pip install deltasci — a Python CLI + library.
Claude Code skills — install skill/ for the two-perspective hypothesis mode (deltasci), and/or skill-ground/ for the grounding layer (deltasci-ground), where the agent drives scan → gap and calls the deterministic engine to verify.

Both share the same domain packs (biomed, materials, climate, plus your own).

Why two perspectives?

Free-form chatbot brainstorming gives plausible-sounding research ideas with no grounding. Single-prompt "be a scientist" approaches collapse two distinct expertises (domain mechanism, ML feasibility) into one voice and produce mush. DeltaScience keeps them separate:

Round 1  Domain Scientist  → mechanism, unmet need, prior evidence, constraints
Round 2  ML Engineer       → data representation, method, precedents, risks
Round 3  Domain Scientist  → refinement, evaluation realism, falsifiable prediction
Round 4  ML Engineer       → revised plan, formulas, implementation, expected outcomes
                ↓
            Synthesis: a grounded, falsifiable hypothesis with citation trail

A complete deltasci run produces six artifacts: hypothesis, experiment protocol, risk register, adversarial challenger findings, citation audit, and a transcript of the dialogue that produced them — all auditable, all in one navigable run directory.

Six things make it different from "just prompting an LLM":

	DeltaScience	Free-form LLM
Grounding	Every claim tagged with type, source, AND AI's training-coverage self-assessment (`well-covered` / `sparse`)	Untagged; mixes facts and vibes
Honest about AI's edges	Material outside training distribution is emitted as `[KNOWLEDGE_GAP]` for the researcher, not fabricated	Confabulates citations to fill gaps
Falsifiability gate	Synthesis refuses to emit a hypothesis without a measurable threshold	"It might work!"
Epistemic humility gate	Synthesis refuses if zero `KNOWLEDGE_GAP` and zero `NOVEL_SYNTHESIS` across the dialogue (a complete-certainty transcript is itself a hallucination signal)	No such check
Citation audit pillar	Every PMID/DOI/arXiv ID/GitHub repo/GEO accession in a CLAIM is verified against the real PubMed/Crossref/OpenAlex/GitHub record; metadata mismatches surface as a prominent "FAILED AUDIT" section (this catches the BioIntel-style failure where a fabricated PMID got a green checkmark)	"I checked it" with no actual fetch
Adversarial challenger	A second-opinion model (optionally a different provider via `--challenger-llm`) tries to break the hypothesis; its findings are first-class output and its own citations get audited too	None
Structured protocol + risks	Hypothesis ships with a concrete 6-step experiment plan and a ranked risk register; both flow through the same audit pillar	Hypothesis text only
Domain awareness	Pluggable domain packs parameterize the expert lens	Generic "be a scientist"

The three first-class tags

[CLAIM type=<TYPE> coverage=<COVERAGE> source="<CITATION>"]<text>[/CLAIM]
[KNOWLEDGE_GAP category=<CATEGORY>]<question for the researcher>[/KNOWLEDGE_GAP]
[NOVEL_SYNTHESIS rationale="<one-line>"]<the connection you're proposing>[/NOVEL_SYNTHESIS]

coverage ∈ {well-covered, sparse}. Material the AI thinks is uncovered (lab-tribal, paywalled, niche, non-English, the researcher's pilot data, etc.) is never allowed as a CLAIM — it must be a KNOWLEDGE_GAP. Connections the AI is making (not citing) are NOVEL_SYNTHESIS. This is exactly the line between "AI knows what's well-discussed online" and "the researcher knows what's at the frontier of their field" — DeltaScience surfaces it instead of papering over it.

Install

pip install deltasci                    # core (CLI + library)
pip install "deltasci[anthropic]"       # + Anthropic adapter
pip install "deltasci[openai]"          # + OpenAI adapter
pip install "deltasci[all]"             # both

Provider keys are read from the environment (ANTHROPIC_API_KEY or OPENAI_API_KEY).

Quick start

One-liner (Materials)

export ANTHROPIC_API_KEY=...
deltasci run \
  --pack materials \
  --idea "Train a graph neural network on the Materials Project to identify Li-ion cathode candidates in the spinel family with predicted voltage > 4.3V."

Outputs go to ./deltasci-output/<timestamp>_<slug>/ in a numbered staged layout:

00_idea.md
01_framing/      02_engineering/    03_refinement/    04_plan/      (per-round transcripts)
05_synthesis/    hypothesis.md + summary.json (three-section evidence trail + falsifiability)
06_protocol/     protocol.md + experiment_plan.json (concrete, execution-ready plan)
07_risks/        risks.md + risk_register.json (5–10 ranked failure modes + mitigations)
08_audits/       citations.json + codex.md (citation audit + adversarial challenger)
manifest.json    (run-level metadata)

The web UI at deltasci view <run-dir> (auto-launched at the end of deltasci run unless you pass --no-view) renders all of this with the audit results, including any FAILED AUDIT section showing both what the AI claimed and what was actually at the cited identifier.

Try it without an API key

deltasci demo --pack biomed --llm mock

This runs a deterministic mock LLM end-to-end so you can see the output shape.

Inside Claude Code

git clone https://github.com/boheling/deltasci
cd deltasci
bash skill/install.sh         # deltasci — two-perspective hypothesis mode
bash skill-ground/install.sh  # deltasci-ground — the scan → gap → verify grounding layer

Then in Claude Code, for the grounding layer (the agent writes the queries and reasons; verify stays deterministic — no key):

"Ground this idea: an experience-learning framework with RL for LLM-agent skill evolution." "Verify the citations in paper.pdf."

…or for the hypothesis mode:

"Use deltasci with the climate pack to generate a hypothesis for: train a neural emulator on ERA5 to downscale Sahel precipitation."

Verify citations in any text (no run required)

DeltaScience's citation-audit pillar also ships as a standalone verifier you can point at any LLM-generated scientific text — a pasted related-work section, a JSON list of claims, or a .bib file. It checks that each cited PMID / DOI / arXiv / GitHub identifier exists, that its metadata matches, and (by default) that the cited paper actually supports the claim — catching the "real paper, wrong citation" failure that plagues autonomous AI-scientist pipelines. No provider API key required.

deltasci verify --file related_work.md          # untagged prose
echo "X drives Y (PMID 35562209)." | deltasci verify --file -
deltasci verify --text '…' --json               # machine output; exit code 2 on any failed audit

Each claim gets a verdict: PASS / FABRICATED / METADATA-MISMATCH / UNSUPPORTED / UNVERIFIABLE / SKIPPED.

As an MCP server

Verify generated citations from inside any MCP client (Claude Code/Desktop, Cursor) or AI-scientist pipeline — without forking anything:

pip install "deltasci[mcp]"
claude mcp add deltasci-verify -- deltasci-mcp

It exposes one tool, verify_scientific_claims(text, format, check_support), returning the same per-claim verdicts.

Verify a whole paper (PDF)

Real papers cite by number, with the references in a bibliography at the bottom — so a pasted paragraph only has [12], nothing to resolve. Paper mode ingests the whole document: it parses the bibliography, resolves every reference to a real record (embedded DOI/PMID/arXiv, or a Crossref title lookup), links each in-text marker to its reference, and checks each citation in the context of the sentence that cites it.

pip install "deltasci[pdf]"
deltasci verify --pdf paper.pdf                 # verify every numbered citation in context
deltasci verify --pdf paper.pdf --max-references 30   # fast first pass on a big bibliography
deltasci verify --paper --file paper.txt        # pasted full text (body + references)
deltasci verify --pdf paper.pdf --llm anthropic # LLM fallback for author-year / messy bibliographies

The web UI (/verify) also accepts a PDF upload and shows one card per citation — its verdict, the in-text sentence it was cited in, and a link to the real record. Deterministic by default (no API key); the --llm fallback only structures messy bibliographies — every citation is still verified against the real record deterministically.

Note: arXiv references are verified via their DataCite DOI (reliable, no rate-limit issues), so arXiv-heavy CS papers work too. The claim-to-abstract support check is PubMed-only — non-PubMed references get existence + metadata verification rather than claim-context. Author-year bibliographies (no [n] numbers) are handled by extracting and verifying every cited identifier; use --llm for full per-claim context on those.

Built-in domain packs

Pack	Display name	What it lenses
`biomed`	Biomedical Sciences	mechanism, patient framing, evidence base, IRB / regulatory pathway, translational realism
`materials`	Materials Science	first principles, composition/structure space, DFT bias, synthesizability, validation pathway
`climate`	Climate & Earth Sciences	physical conservation, observational data ecosystem, statistical regime, decision relevance

List them: deltasci list-packs. Inspect one: deltasci show-pack biomed.

Author your own domain pack

A domain pack is two files in a directory:

my_pack/
├── pack.toml      # metadata + evidence rules + scoring rubric
└── lens.md        # the domain expert's reasoning lens (markdown)

Scaffold one:

deltasci init-pack neuroscience
# ... edit pack.toml and lens.md ...
deltasci validate-pack ./packs/neuroscience
deltasci run --pack ./packs/neuroscience --idea "..."

See docs/AUTHORING_DOMAIN_PACKS.md for the full guide.

Comparison

Tool	Scope	Open source	License	Domain-pluggable	Falsifiability gate
DeltaScience	Hypothesis ideation	✅	MIT	✅ packs	✅ hard requirement
ChatGPT / Claude direct	Generic chat	n/a	n/a	manual prompts	❌
AI Scientist (Sakana)	Full paper generation	✅	Apache 2.0	partial	❌
Coscientist (CMU)	Chemistry experiment design	partial	research	locked	partial
Galactica / scite / Elicit	Literature retrieval	✅/❌	mixed	n/a	n/a

DeltaScience deliberately occupies a small niche: get to a defensible hypothesis. It hands off to your favourite paper-writing or experiment-design tool downstream.

Library API

from deltasci import CoReasoner, Config, load_pack
from deltasci.llm import get_adapter

pack = load_pack("biomed")
llm = get_adapter("anthropic")
reasoner = CoReasoner(pack=pack, llm=llm, config=Config(num_rounds=4))

result = reasoner.run(idea="Predict checkpoint-immunotherapy non-response in TFE3-fusion osteosarcoma from spatial transcriptomics.")
print(result.hypothesis.title)
print(result.hypothesis.falsifiability.threshold)
print(result.hypothesis.feasibility_scores.overall)

The full hypothesis schema is documented in docs/ARCHITECTURE.md.

Configuration

Environment variables:

Variable	Default	Notes
`DELTASCI_LLM_PROVIDER`	auto	`anthropic`, `openai`, `mock`, or `auto`
`DELTASCI_MODEL`	provider default	model id override
`DELTASCI_OUTPUT_DIR`	`./deltasci-output`	where outputs are written
`ANTHROPIC_API_KEY`	—	required for `--llm anthropic`
`OPENAI_API_KEY`	—	required for `--llm openai`

CLI flags override env vars (--llm, --model, --out).

Privacy & ethics

DeltaScience runs locally and does not phone home. The only outbound traffic is to whatever LLM provider you choose. No telemetry, no user accounts, no server.

If you use it for clinical, regulatory, or high-stakes research work, the falsifiability gate is not a substitute for IRB review, regulatory pathway analysis, or clinical validation. The tool is an ideation aid, not a decision-maker.

Citation

If DeltaScience helps your research, citing it as:

@software{deltascience2026,
  title  = {DeltaScience: Two-Perspective Co-Reasoning for AI4Science Hypothesis Generation},
  author = {{DeltaScience contributors}},
  year   = {2026},
  url    = {https://github.com/boheling/deltasci},
  note   = {Version 0.1.0}
}

Contributing

Contributions welcome — especially new domain packs. See CONTRIBUTING.md. The fastest path:

Open a domain pack proposal issue.
Author the pack (≈50 LOC of TOML + 1 markdown file).
Open a PR with validate-pack output and one example transcript.

License

MIT — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.github/ISSUE_TEMPLATE		.github/ISSUE_TEMPLATE
data		data
discovery/reports		discovery/reports
docs		docs
skill-ground		skill-ground
skill		skill
space		space
src/deltasci		src/deltasci
tests		tests
tools		tools
web		web
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeltaScience

What is DeltaScience?

Why two perspectives?

The three first-class tags

Install

Quick start

One-liner (Materials)

Try it without an API key

Inside Claude Code

Verify citations in any text (no run required)

As an MCP server

Verify a whole paper (PDF)

Built-in domain packs

Author your own domain pack

Comparison

Library API

Configuration

Privacy & ethics

Citation

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeltaScience

What is DeltaScience?

Why two perspectives?

The three first-class tags

Install

Quick start

One-liner (Materials)

Try it without an API key

Inside Claude Code

Verify citations in any text (no run required)

As an MCP server

Verify a whole paper (PDF)

Built-in domain packs

Author your own domain pack

Comparison

Library API

Configuration

Privacy & ethics

Citation

Contributing

License

About

Topics

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages