Skip to content

feat(v0.2.0): reference scholar + synthesis_doc_builder#2

Merged
drknowhow merged 1 commit into
mainfrom
feat/v0.2.0-reference-impl
Jun 2, 2026
Merged

feat(v0.2.0): reference scholar + synthesis_doc_builder#2
drknowhow merged 1 commit into
mainfrom
feat/v0.2.0-reference-impl

Conversation

@drknowhow

Copy link
Copy Markdown
Owner

Summary

Ports the v0.1 reference implementation (scholar adapter + synthesis
doc builder) into the public deep-research repo with agent-runtime
couplings stripped behind clean abstraction surfaces.

What ships

  • lib/scholar.py — stdlib-urllib adapter over six free academic
    APIs (OpenAlex, Semantic Scholar, PubMed, arXiv, Europe PMC, Crossref)
    plus Unpaywall. Five actions: search, multi_search, get,
    find_doi, resolve_oa. Uniform normalized hit schema.
  • lib/synthesis_doc_builder.pypython-docx + matplotlib
    helper that renders forest plot, PRISMA flow, stance heat-table, and
    assembles a structured .docx with native heading hierarchy and
    tables.
  • pyproject.toml — installable package. Core deps: stdlib only.
  • tests/ — 21 stdlib unittest tests. Network-free. pathlib.Path
    everywhere (no \ separator literals — Linux-CI safe).
  • CHANGELOG.md — first entry for the repo; covers 0.2.0, 0.1.1,
    0.1.0.

Abstraction surfaces (the only API changes that matter)

Coupling stripped New surface Default behavior
Hard-coded polite-pool mailto=… + User-Agent scholar.configure(contact_email, app_name) + SCHOLAR_CONTACT_EMAIL env UA = deep-research-scholar/1.0; no mailto sent
Direct import of an in-house embeddings module scholar.set_embedding_deduper(fn) No-op pass-through; hash dedup remains
Drive upload coupled inside the builder build_synthesis_doc(inputs, *, uploader: Callable[[Path, str, str], dict]) No upload; local .docx returned, uploaded=False
matplotlib + python-docx as hard deps Soft imports; raise RuntimeError on use Install via pip install "deep-research[viz]"

Test plan

  • python -m unittest discover tests — 21/21 pass locally (Python
    3.13, Windows, with [viz] extras installed).
  • Surname / yepgent.com / dimitri@ grep across lib/ and root
    — clean.
  • CI on Linux runner — green required before merge.
  • python -c "import lib.scholar; lib.scholar.scholar('search', {'source':'openalex','query':'test','limit':1})" against the real OpenAlex API after merge.

Out of scope

  • No changes to SKILL.md protocol semantics — only the v0.1 "lands in
    v0.2" hedge was removed.
  • No changes to schema/ or agents/.
  • No version bump on the manifest schema (still manifest_version: 0.4).

🤖 Generated with Claude Code

Ports the Python reference implementation into the public repo with
agent-runtime couplings stripped.

lib/scholar.py
  - stdlib-urllib adapter over OpenAlex, Semantic Scholar, PubMed,
    arXiv, Europe PMC, Crossref, and Unpaywall.
  - Five actions: search, multi_search, get, find_doi, resolve_oa.
  - Uniform normalized hit schema across all sources.
  - Polite-pool contact email is no longer hard-coded:
    * configure(contact_email, app_name) sets module-global UA + mailto.
    * SCHOLAR_CONTACT_EMAIL env var honored at import time.
    * Without configuration, mailto params are omitted (APIs still work,
      polite-pool benefits forfeited).
  - Embedding-based dedup in multi_search is now pluggable via
    set_embedding_deduper(fn) — no hard dep on any embeddings module.
    Unregistered = no-op pass-through; hash dedup remains the safety net.

lib/synthesis_doc_builder.py
  - python-docx + matplotlib helper that renders forest plot, PRISMA
    flow, stance heat-table, and assembles the .docx with native heading
    hierarchy + tables.
  - Drive upload decoupled behind a DI Uploader callable:
        Uploader = Callable[[Path, str, str], dict]
        build_synthesis_doc(inputs, *, uploader=None)
    Without an uploader the helper returns the local .docx path and
    uploaded=False; with one, doc_id / web_url come back populated.
  - matplotlib + python-docx remain soft imports; RuntimeError on use
    when missing, not ImportError at load.

pyproject.toml
  - Installable package. Core deps: stdlib only.
  - [viz] extra: python-docx>=1.0, matplotlib>=3.7, numpy>=1.24.

tests/
  - test_scholar_smoke.py — 14 tests. urllib.request.urlopen monkey-
    patched with per-source fake responses. Confirms normalized hit
    schema, configure() mutates UA + email, find_doi best-match, OA
    flattening, dedup behavior. Network-free.
  - test_synthesis_doc_builder_smoke.py — 7 tests. tempfile.mkdtemp +
    pathlib.Path only (no \ separator literals — Linux CI safe).
    Asserts .docx valid zip with word/document.xml containing heading
    text + table tag; uploader called with correct args + name;
    uploader exception preserves local artifact. matplotlib/python-docx
    tests SkipTest cleanly when soft deps absent.

Docs
  - README "What ships in v0.1.0" -> "What ships in v0.2.0"; new
    "What's new in v0.2.0" section; pip install "deep-research[viz]"
    in Quickstart.
  - SKILL.md: drops the "when ported in v0.2" hedge; bumps version: 0.2.0.
  - manifests/deep-research.v0.4.json: tool.version 0.1.1 -> 0.2.0;
    runtime.install.ref v0.1.0 -> v0.2.0; description rewritten to
    name the shipped library surface.
  - CHANGELOG.md: new file with 0.2.0, 0.1.1, 0.1.0 entries.

Abstraction surfaces in one breath
  - Polite-pool: configure() + env var; no embedded contact details.
  - Embedding dedup: register a runtime-specific function or accept the
    no-op fallback.
  - Document upload: pass an Uploader or accept "local file only".
  - Soft deps: matplotlib + python-docx live behind [viz]; tests skip.

21/21 tests pass locally on Python 3.13 (Windows + matplotlib + python-docx
installed). No surname or yepgent.com references in the shipped tree.
Copilot AI review requested due to automatic review settings June 2, 2026 10:37
@drknowhow drknowhow merged commit 485dfc1 into main Jun 2, 2026
1 check passed
@drknowhow drknowhow deleted the feat/v0.2.0-reference-impl branch June 2, 2026 10:38

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Ports the v0.1 reference implementation into this repo by adding a stdlib-only scholarly search adapter (lib/scholar.py) and an optional-deps synthesis .docx builder (lib/synthesis_doc_builder.py), plus packaging/docs/tests updates to support a v0.2.0 release.

Changes:

  • Add lib/scholar.py (multi-source search/get/find_doi/resolve_oa with normalized hit schema) and lib/synthesis_doc_builder.py (docx + plots with optional uploader injection).
  • Add stdlib unittest smoke tests for both modules (network-free via monkeypatched urlopen).
  • Introduce pyproject.toml packaging and update README/SKILL/manifest/CHANGELOG to v0.2.0.

Reviewed changes

Copilot reviewed 10 out of 11 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
lib/scholar.py New unified scholarly API adapter with search, multi_search, get, find_doi, and resolve_oa actions.
lib/synthesis_doc_builder.py New synthesis document builder producing .docx and optional plots, with optional uploader callback.
lib/__init__.py Introduces library package marker + version string.
tests/test_scholar_smoke.py Adds network-free smoke tests for scholar behaviors and normalization.
tests/test_synthesis_doc_builder_smoke.py Adds smoke tests for doc builder with soft-dep skipping.
tests/__init__.py Marks tests as a package (empty).
pyproject.toml Adds installable packaging config and [viz] extras for doc builder deps.
README.md Updates “what ships”, quickstart, and v0.2.0 feature descriptions.
SKILL.md Bumps skill version and updates synthesis builder documentation.
manifests/deep-research.v0.4.json Bumps tool version/ref and updates manifest description for v0.2.0.
CHANGELOG.md Adds changelog entries up through v0.2.0.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread pyproject.toml
readme = "README.md"
requires-python = ">=3.11"
license = { text = "Apache-2.0" }
authors = [{ name = "Dimitri T", email = "" }]
Comment thread lib/scholar.py
Comment on lines +1061 to +1064
key = (
f"doi:{h.get('doi')}" if h.get("doi") else
(f"title:{_title_hash_key(h.get('title'))}" if h.get("title") else None)
)
Comment thread lib/scholar.py
# Per-source: arXiv (Atom XML over export.arxiv.org)
# ---------------------------------------------------------------------------

ARXIV_BASE = "http://export.arxiv.org/api/query"
Comment on lines +606 to +613
result: dict[str, Any] = {
"local_docx_path": str(docx_path),
"plots": {k: str(v) for k, v in plots.items()},
"doc_id": None,
"web_url": None,
"uploaded": False,
"upload_error": None,
}
Comment thread README.md
Comment on lines +74 to +75
A working reference of both modules also lives upstream in the
[Yep agent](https://yepgent.com) codebase.
Comment on lines +96 to +100
def setUp(self):
if not _has_python_docx():
raise unittest.SkipTest("python-docx not installed (viz extra)")
self.tmp = Path(tempfile.mkdtemp(prefix="dr_smoke_"))

Comment thread lib/scholar.py


def _normalize_crossref_item(it: dict) -> dict:
doi = _normalize_doi(it.get("DOI"))
Comment thread lib/scholar.py
elif crossref_type in ("review-article",):
tier_hint = 3
return {
"id": f"crossref:{doi or ''}",
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants