Skip to content

Releases: Agents365-ai/paper-fetch

v0.5.0 — Agent-native CLI (real)

11 Apr 08:39

Choose a tag to compare

The agent-native release

paper-fetch is now a reference-quality agent-native CLI. Scored against the agent-native CLI rubric, the tool moves from 23 / 28 (partially agent-native) to 28 / 28, clearing every one of the seven principles: structured output, delegated auth, self-description, graduated safety, boundary validation, schema-as-truth, and three-audience design.

The resolution chain, host allowlist, and download hardening are unchanged. The contract, self-description, and failure routing are what changed.

Highlights

Self-description

  • New schema subcommand. python scripts/fetch.py schema emits a full machine-readable description of the CLI: parameters, types, exit codes, error codes, envelope shapes, environment variables. Runs offline. Agents should cache it against schema_version and re-read on drift.
  • Every envelope carries a meta slot. request_id, latency_ms, schema_version, cli_version, sources_tried. Orchestrators can correlate logs, track SLOs, and detect schema drift without parsing prose.

Output contract

  • ok: "partial" on mixed batches, with a next slot suggesting the exact command that retries only the failed subset.
  • TTY-aware --format defaultjson when stdout is piped or captured, text in a terminal. Explicit --format overrides. Agents no longer have to remember the flag.
  • --stream NDJSON mode — one result per line as each DOI resolves, then a final summary line. Useful for long batches.
  • --pretty for indented JSON.

Progress and observability

  • NDJSON progress events on stderr in JSON mode: start, source_try, source_hit, source_miss, source_skip, source_enrich, source_enrich_failed, download_ok, download_error, download_skip, dry_run, not_found, update_check_spawned. Every event carries request_id and elapsed_ms for correlation with the final stdout envelope.
  • Auto-update now emits update_check_spawned so the silent background git pull is observable.

Retries and idempotency

  • --idempotency-key KEY — re-running with the same key replays the original envelope from <out>/.paper-fetch-idem/, re-stamps meta.request_id and meta.latency_ms, and sets meta.replayed_from_idempotency_key. No network I/O.
  • Skip-if-exists by default; --overwrite to force re-download.
  • not_found is now retryable: true with retry_after_hours: 168, since OA availability changes over time (embargoes lift, preprints appear).
  • Metadata enrichment from Semantic Scholar when Unpaywall returns a PDF URL but omits author/title — filename goes from unknown_2021_Highly_accurate_protein_structure_predic.pdf to Jumper_2021_Highly_accurate_protein_structure_predic.pdf.

Exit code taxonomy

  • 0 success, 1 unresolved (no OA copy; not retryable now), 2 reserved for auth, 3 validation, 4 transport (network/IO, retryable). Orchestrators can now route failures deterministically without parsing JSON.

New flags and environment

  • --timeout SECONDS — override the default 30s HTTP timeout.
  • paper-fetch - / --batch - — read DOIs line-by-line from stdin.
  • PAPER_FETCH_ALLOWED_HOSTS — comma-separated hosts that extend the hard-coded download allowlist.
  • --version reports paper-fetch <cli> (schema <schema>).

Backwards compatibility

Existing invocations still work:

  • python scripts/fetch.py <doi>
  • python scripts/fetch.py --batch dois.txt --out ./papers
  • python scripts/fetch.py <doi> --dry-run
  • python scripts/fetch.py <doi> --format text

Things an older consumer might trip on:

  • response.ok now can be the string "partial" on mixed batches. Truthy in every language I care about, but response.ok === true would miss partial.
  • Exit code 1 previously meant "any runtime failure"; it now means specifically "unresolved" (no OA copy found). Transport failures are now exit 4. An orchestrator that hardcoded exit == 1 to mean "some DOIs failed due to anything" will miss transport failures and should switch to exit != 0 or check the stdout envelope.

Versioning

  • CLI_VERSION 0.3.0 → 0.5.0 (skipping 0.4.x because an unrelated v0.4.0 tag already existed on origin pointing at a different commit)
  • SCHEMA_VERSION 1.1.0 — stable. The new stderr events and meta fields are additive. Agents caching schema against schema_version do not need to re-discover.

Auto-update

Users who installed via git clone will pick this up automatically on the second invocation after their 24h cooldown elapses (the first invocation spawns the background pull; the second sees the new code). Force immediately with rm <skill_dir>/.git/.paper-fetch-last-update. Disable with PAPER_FETCH_NO_AUTO_UPDATE=1.

v0.4.0 — Silent background self-update

11 Apr 03:08
f7c9359

Choose a tag to compare

Highlights

Silent background self-update (#3)

When installed via git clone, paper-fetch now keeps itself in sync with upstream automatically — no cron, no launchd, no hooks, no manual git pull. Each invocation spawns a detached background git pull --ff-only in the skill directory.

  • Non-blocking — current call is not delayed; start_new_session=True fully detaches
  • Silent — all output to /dev/null; JSON contract on stdout is never polluted
  • Throttled — at most once per 24h via .git/.paper-fetch-last-update
  • Safe--ff-only refuses to merge if you have local edits
  • Convergence — updates apply on the next invocation

Kill switches:

export PAPER_FETCH_NO_AUTO_UPDATE=1       # disable entirely
export PAPER_FETCH_UPDATE_INTERVAL=3600   # 1h cooldown instead of 24h

Online landing page (#2)

Live at https://agents365-ai.github.io/paper-fetch/ (English) and zh.html (中文). Includes hero, features, 5-source resolution walkthrough, discipline coverage table, comparison vs native agent, and install tabs for all 6 platforms.

Discipline-agnostic clarification (#1)

New "Discipline Coverage" / "学科覆盖" section in both READMEs. The skill works for any field — humanities, social sciences, chemistry, economics, psychology, materials — not just life sciences or CS. Unpaywall + Semantic Scholar cover every Crossref DOI; arXiv/PMC/bioRxiv are additional fallbacks for their specific domains.

Upgrading

Existing installs: nothing to do. On your next paper-fetch call (or within 24h for already-updated installs), the background self-updater will pull this release. From v0.4.0 onward, all future updates are fully automatic.

New installs: unchanged.

git clone https://github.com/Agents365-ai/paper-fetch.git ~/.claude/skills/paper-fetch

Full changelog

  • feat: add silent background self-update on each invocation (#3)
  • docs: add online landing page (GitHub Pages) (#2)
  • docs: clarify that paper-fetch is discipline-agnostic (#1)

Full diff: v0.3.0...v0.4.0

v0.3.0 — Agent-native CLI

10 Apr 16:20

Choose a tag to compare

What's Changed

Agent-native CLI (scripts/fetch.py)

  • Structured JSON output — stable {"ok": bool, "data": ...} envelope on stdout
  • --format json|text — JSON for agents (default), text for humans
  • --dry-run — resolve sources without downloading
  • Distinct exit codes — 0=success, 1=runtime, 3=validation
  • Granular error codesnot_found, download_network_error, download_not_a_pdf, download_host_not_allowed, download_size_exceeded, download_io_error, internal_error
  • UNPAYWALL_EMAIL now optional — warns on stderr, skips Unpaywall, remaining 4 sources still work
  • Download safety — host allowlist (27 known OA domains) + 50 MB per-PDF size limit
  • Top-level exception wrapper — no tracebacks on stdout

Platform support

  • Added pi-mono (metadata.pimo namespace)

Documentation

  • Partial-failure batch example in SKILL.md
  • --dry-run and --format text examples in both READMEs
  • Support and Author sections

Full Changelog: v0.1.0...v0.3.0

v0.1.0 — Initial release

08 Apr 09:17

Choose a tag to compare

paper-fetch v0.1.0

Legal open-access PDF downloader for academic papers.

Features

  • 5-source fallback chain: Unpaywall → Semantic Scholar openAccessPdf → arXiv → PubMed Central OA → bioRxiv/medRxiv
  • Zero dependencies — pure Python standard library
  • Batch mode--batch dois.txt
  • Auto-named output{author}_{year}_{title}.pdf
  • Never uses Sci-Hub or any paywall-bypass service

Multi-Platform Support

Claude Code · OpenClaw / ClawHub · Hermes Agent · OpenAI Codex · SkillsMP

Setup

export UNPAYWALL_EMAIL=you@example.com
python scripts/fetch.py 10.1038/s41586-021-03819-2