Skip to content

Fix broken literature_enhanced imports in two writer scripts#88

Merged
realmarcin merged 2 commits into
mainfrom
fix/broken-literature-imports
May 26, 2026
Merged

Fix broken literature_enhanced imports in two writer scripts#88
realmarcin merged 2 commits into
mainfrom
fix/broken-literature-imports

Conversation

@realmarcin
Copy link
Copy Markdown
Contributor

Summary

Fix a pre-existing import bug surfaced as a heads-up by PR #87.

scripts/add_evidence_source.py and scripts/intelligent_snippet_fixer.py both import EnhancedLiteratureFetcher from communitymech.literature_enhanced — a module that was never committed to git (only a stale .pyc in __pycache__/ was shadowing the missing source locally). Both scripts raise ModuleNotFoundError on import for anyone running them today.

Swap to LiteratureFetcher from communitymech.literature, which exists in the repo, is well-tested, and exposes the same fetch_pubmed_abstract + fetch_paper surface plus a richer DOI fallback chain (CrossRef → PubMed-via-DOI → PMC full-text → OpenAlex → Semantic Scholar → Europe PMC → publisher meta-tag scrape) that subsumes what the missing fetch_abstract_for_doi did.

API differences

Difference Impact
fetch_paper() returns (abstract, pdf_url) not a dict Tuple-unpack at call sites
fetch_paper() has no download_pdf kwarg The flag was effectively a no-op in LiteratureFetcher's pipeline (the PDF URL is just returned alongside the abstract)
Title field unavailable separately add_evidence_source.guess_evidence_source filter(None, …)-merged title with snippet + abstract; losing it degrades classification marginally. PubMed abstracts include the title in the abstract text, so PMID references are unaffected. If DOI title is needed later, fetch_doi_metadata() returns CrossRef metadata with a title field.
Constructor: no use_fallback_pdf arg Drop it; was a no-op

After-state

  • Both scripts now import + run their initialization paths cleanly (verified via importlib.util.spec_from_file_location + exec_module).
  • pytest tests/: 136 passed, 9 skipped (unchanged).
  • ruff check: pre-existing F541 warnings on lines this PR didn't touch are left alone.

Test plan

  • Both scripts pass ast.parse.
  • Both scripts now import without ModuleNotFoundError.
  • Existing tests pass.
  • (Manual, post-merge) Run one of the converted writers against real data with --apply to confirm end-to-end behavior.

🤖 Generated with Claude Code

scripts/add_evidence_source.py and scripts/intelligent_snippet_fixer.py
both import EnhancedLiteratureFetcher from communitymech.literature_enhanced
— a module that was never committed to git (only a stale .pyc was
shadowing the missing source locally). Both scripts have raised
ModuleNotFoundError on import for as long as anyone has tried to run
them, which was surfaced as a pre-existing-state heads-up by the recent
writer-conversion PR #87.

Swap to LiteratureFetcher from communitymech.literature, which exposes
the same fetch_pubmed_abstract + fetch_paper surface plus a richer
DOI fallback chain (CrossRef → PubMed via DOI lookup → PMC full-text →
OpenAlex → Semantic Scholar → Europe PMC → publisher meta-tag scrape)
that subsumes what fetch_abstract_for_doi did. API differences:

- fetch_paper returns (abstract, pdf_url) not a dict; tuple-unpack at
  call sites.
- LiteratureFetcher.fetch_paper has no download_pdf kwarg (the older
  version's flag was a no-op in the LiteratureFetcher pipeline; the
  pdf URL is just returned alongside the abstract).
- Title field is unavailable separately. In add_evidence_source.py's
  guess_evidence_source classifier the title was filter(None, …)-merged
  with snippet and abstract anyway; losing it degrades classification
  marginally (PubMed abstracts include the title in the abstract text,
  so PMID references are unaffected). If richer DOI classification is
  needed later, LiteratureFetcher.fetch_doi_metadata() returns CrossRef
  metadata with a title field.

After-state: both scripts now import and run their initialization paths
cleanly. pytest tests/ still passes (136 passed, 9 skipped).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 26, 2026 02:27
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes two writer scripts that currently fail to import due to referencing a non-existent communitymech.literature_enhanced module, by switching them to the in-repo communitymech.literature.LiteratureFetcher API.

Changes:

  • Replace EnhancedLiteratureFetcher imports/usages with LiteratureFetcher in scripts/intelligent_snippet_fixer.py and scripts/add_evidence_source.py.
  • Update call sites to match LiteratureFetcher.fetch_paper() returning (abstract, pdf_url) (tuple-unpack and ignore pdf_url).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
scripts/intelligent_snippet_fixer.py Switches to LiteratureFetcher and updates abstract-fetching logic to use fetch_paper() tuple return.
scripts/add_evidence_source.py Switches to LiteratureFetcher, updates fetch_paper() usage, and adjusts exception handling.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/add_evidence_source.py Outdated
Comment thread scripts/add_evidence_source.py Outdated
Copilot flagged that title was assigned None and then passed through
guess_evidence_source as a parameter that the classifier merged into
its keyword-matching text via filter(None, ...). With title always None
the parameter was dead code that just clutters the call sites.

Remove the title parameter from guess_evidence_source and from both
caller blocks. PubMed abstracts already embed the title in the
abstract text (so PMID-driven classification is unchanged), and
CrossRef titles for DOI references are available via
LiteratureFetcher.fetch_doi_metadata() if richer classification is
wanted later — that's now a clear future-work hook rather than a
hard-coded-None pretense.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@realmarcin realmarcin merged commit 393b373 into main May 26, 2026
@realmarcin realmarcin deleted the fix/broken-literature-imports branch May 26, 2026 02:32
realmarcin added a commit that referenced this pull request May 26, 2026
* Add validate-strict CI workflow

Locks in the 0-error closed-schema baseline established across PRs
#84-#88 so it can't silently regress on future merges. Mirrors the
qc.yaml workflow shipped to TraitMech in PR #77 and adapted for the
CommunityMech runner conventions.

Runs `just validate-strict` + `just audit-writers` + `pytest tests/`
on PRs touching kb/communities/, schema, source, scripts, justfile,
or this workflow. Uploads the categorized TSV reports as workflow
artifacts so reviewers can inspect failures without re-running
locally.

Deliberately scoped narrower than `just qc` — the existing
network-quality.yml handles the network-integrity audit; this new
workflow specifically gates closed-schema validation + writer audit
+ unit tests, which are fast and run on every PR.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address Copilot review on PR #89

Three findings, all addressed:

- Path globs: \`**.py\` doesn't actually match nested subdirs in
  GitHub Actions glob semantics — it behaves like \`*.py\`. Switch
  to \`**/*.py\` so changes under src/communitymech/<subpkg>/ (like
  network/, validators/, embedding/) trigger the workflow.
- Push trigger had no \`paths:\` filter, so the workflow ran on
  every commit to main. Mirror the pull_request path list via a
  YAML anchor (&trigger_paths + *trigger_paths) so the two stay in
  sync.
- uv sync: switch to \`--frozen --all-extras\` so the workflow fails
  if uv.lock is stale (instead of silently re-resolving) while
  keeping the dev/test extras the existing network-quality.yml
  uses. Also added uv.lock + tests/**/*.py to the trigger paths so
  dependency and test changes re-run the workflow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants