Skip to content

Importer perf: derive comicbox delete_keys from USED set#633

Merged
ajslater merged 1 commit intov1.11-performancefrom
importer-comicbox-delete-keys
Apr 28, 2026
Merged

Importer perf: derive comicbox delete_keys from USED set#633
ajslater merged 1 commit intov1.11-performancefrom
importer-comicbox-delete-keys

Conversation

@ajslater
Copy link
Copy Markdown
Owner

Summary

Implements the field-projection half of tasks/importer-perf/06-comicbox-side.md (planning PR #627). Independent of #632 (filesystem mtime pre-filter, the other half of sub-plan 06).

After investigating the comicbox internals, the plan's "Improvement A: to_dict(keys=...)" turned out to be ~95% already implemented via comicbox's existing delete_keys config — codex passes delete_keys to skip parse work for unwanted top-level fields, and the most expensive computed action (pages) is already short-circuited when pages is in delete_keys. The substantive runtime win for sub-plan 06 was Improvement D (filesystem mtime pre-filter, PR #632).

What was actually missing:

  1. cover_image was never added to delete_keys despite codex never reading it — comicbox parsed it on every comic and dropped it downstream.
  2. Future comicbox releases that add a new top-level schema field would silently come through as parsed-but-unused work until someone noticed and updated the hand-curated delete_keys.

Change

  • Move USED_COMICBOX_FIELDS to settings/__init__.py as the single source of truth for the codex↔comicbox contract.
  • Derive _COMICBOX_DELETE_KEYS programmatically from ComicboxYamlSubSchema._declared_fields - USED_COMICBOX_FIELDS.
    • New comicbox fields default to deleted (safe default — codex maintainer adds to USED if needed).
    • cover_image is now correctly included.
  • aggregate_path.py imports the constant from settings instead of maintaining its own copy. The runtime _transform_metadata filter still uses the same set, so the codex-side post-extract drop remains the second-line backstop.

Net delete_keys: 12 → 13 entries (cover_image newly included). No other behavior change; this just keeps the existing mechanism honest going forward.

Test plan

  • make fix clean
  • make lint-python clean (0 errors, 0 warnings)
  • pytest tests/importer/ tests/test_search_fts.py — 7 passed
  • Verified derived delete_keys contains the previous 12 entries plus cover_image

🤖 Generated with Claude Code

The codex importer already passes a hand-curated delete_keys set to
comicbox so the worker skips parse work for top-level fields the
importer doesn't consume. Two issues with the hand-curation:

1. cover_image was never added to delete_keys despite never being
   read by codex — comicbox parsed it on every comic and dropped it
   downstream.
2. Future comicbox releases that add a new top-level schema field
   would silently start coming through as parsed-but-unused work
   until someone noticed and updated delete_keys.

Move USED_COMICBOX_FIELDS to settings/__init__.py as the single
source of truth for the codex<->comicbox contract, and derive
_COMICBOX_DELETE_KEYS programmatically from
ComicboxYamlSubSchema._declared_fields - USED_COMICBOX_FIELDS. New
comicbox fields default to deleted (safe — codex maintainer adds to
USED if needed); cover_image is now correctly included in the set.

aggregate_path.py imports the constant from settings instead of
maintaining its own copy. The runtime _transform_metadata filter
still uses the same set, so the codex-side post-extract filter
remains the second-line backstop.

Net delete_keys: 12 -> 13 entries (adds cover_image). No other
behavior change; this is the field-projection cleanup half of
tasks/importer-perf/06-comicbox-side.md (Improvement A in spirit;
the existing delete_keys mechanism already implements the runtime
side of the plan).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@ajslater ajslater merged commit 6b014bb into v1.11-performance Apr 28, 2026
1 check failed
@ajslater ajslater deleted the importer-comicbox-delete-keys branch May 2, 2026 22:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant