The agent-native release
paper-fetch is now a reference-quality agent-native CLI. Scored against the agent-native CLI rubric, the tool moves from 23 / 28 (partially agent-native) to 28 / 28, clearing every one of the seven principles: structured output, delegated auth, self-description, graduated safety, boundary validation, schema-as-truth, and three-audience design.
The resolution chain, host allowlist, and download hardening are unchanged. The contract, self-description, and failure routing are what changed.
Highlights
Self-description
- New
schemasubcommand.python scripts/fetch.py schemaemits a full machine-readable description of the CLI: parameters, types, exit codes, error codes, envelope shapes, environment variables. Runs offline. Agents should cache it againstschema_versionand re-read on drift. - Every envelope carries a
metaslot.request_id,latency_ms,schema_version,cli_version,sources_tried. Orchestrators can correlate logs, track SLOs, and detect schema drift without parsing prose.
Output contract
ok: "partial"on mixed batches, with anextslot suggesting the exact command that retries only the failed subset.- TTY-aware
--formatdefault —jsonwhen stdout is piped or captured,textin a terminal. Explicit--formatoverrides. Agents no longer have to remember the flag. --streamNDJSON mode — one result per line as each DOI resolves, then a final summary line. Useful for long batches.--prettyfor indented JSON.
Progress and observability
- NDJSON progress events on stderr in JSON mode:
start,source_try,source_hit,source_miss,source_skip,source_enrich,source_enrich_failed,download_ok,download_error,download_skip,dry_run,not_found,update_check_spawned. Every event carriesrequest_idandelapsed_msfor correlation with the final stdout envelope. - Auto-update now emits
update_check_spawnedso the silent backgroundgit pullis observable.
Retries and idempotency
--idempotency-key KEY— re-running with the same key replays the original envelope from<out>/.paper-fetch-idem/, re-stampsmeta.request_idandmeta.latency_ms, and setsmeta.replayed_from_idempotency_key. No network I/O.- Skip-if-exists by default;
--overwriteto force re-download. not_foundis nowretryable: truewithretry_after_hours: 168, since OA availability changes over time (embargoes lift, preprints appear).- Metadata enrichment from Semantic Scholar when Unpaywall returns a PDF URL but omits author/title — filename goes from
unknown_2021_Highly_accurate_protein_structure_predic.pdftoJumper_2021_Highly_accurate_protein_structure_predic.pdf.
Exit code taxonomy
0success,1unresolved (no OA copy; not retryable now),2reserved for auth,3validation,4transport (network/IO, retryable). Orchestrators can now route failures deterministically without parsing JSON.
New flags and environment
--timeout SECONDS— override the default 30s HTTP timeout.paper-fetch -/--batch -— read DOIs line-by-line from stdin.PAPER_FETCH_ALLOWED_HOSTS— comma-separated hosts that extend the hard-coded download allowlist.--versionreportspaper-fetch <cli> (schema <schema>).
Backwards compatibility
Existing invocations still work:
python scripts/fetch.py <doi>python scripts/fetch.py --batch dois.txt --out ./paperspython scripts/fetch.py <doi> --dry-runpython scripts/fetch.py <doi> --format text
Things an older consumer might trip on:
response.oknow can be the string"partial"on mixed batches. Truthy in every language I care about, butresponse.ok === truewould miss partial.- Exit code
1previously meant "any runtime failure"; it now means specifically "unresolved" (no OA copy found). Transport failures are now exit4. An orchestrator that hardcodedexit == 1to mean "some DOIs failed due to anything" will miss transport failures and should switch toexit != 0or check the stdout envelope.
Versioning
CLI_VERSION0.3.0 → 0.5.0 (skipping 0.4.x because an unrelatedv0.4.0tag already existed on origin pointing at a different commit)SCHEMA_VERSION1.1.0 — stable. The new stderr events andmetafields are additive. Agents caching schema againstschema_versiondo not need to re-discover.
Auto-update
Users who installed via git clone will pick this up automatically on the second invocation after their 24h cooldown elapses (the first invocation spawns the background pull; the second sees the new code). Force immediately with rm <skill_dir>/.git/.paper-fetch-last-update. Disable with PAPER_FETCH_NO_AUTO_UPDATE=1.